From renato.golin at linaro.org Mon Jul 1 00:37:10 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 1 Jul 2013 08:37:10 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> <51D05A9D.6070005@grosser.es> Message-ID: On 30 June 2013 20:05, Anton Korobeynikov wrote: > But in any case, never trust someone who will claim he can reliably > estimate the variance from 3 data points. > One cannot stress this enough. ;) cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Mon Jul 1 00:47:34 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 1 Jul 2013 08:47:34 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: On 30 June 2013 20:08, Anton Korobeynikov wrote: > > Getting 10 samples at different commits will give you similar accuracy if > > behaviour doesn't change, and you can rely on 10-point blocks before and > > after each change to have the same result. > Right. But this way you will have 10-commits delay. So, you will need > 3-4 additional test runs to pinpoint the offending commit in the worst > case. > Well, 10 was an example, but yes, you'll always have N commit-groups delay. My assumption is that some (say 5) commit-groups delay is not a critical issue if it happens once in a while, as opposed to having to examine every hike on a range of several dozen commits. > This is why I proposed something like moving averages. > Moving average will "smooth" the result. So, only really big changes > will be caught by it. > Absolutely. Smoothing is bad, but it's better than what we have, and at least it would catch big regressions. Today, not even the big ones are being caught. You don't have to throw away the original data-points, you just run a moving average to pinpoint big changes, where the confidence that regression occurred is high. In parallel, you can still use the same data-points to do more refined analysis, and even cross-reference multiple analysis' data to give you even more confidence. Anton and David, I could not agree with you more on what's necessary to have a good analysis, I just wished we had something cruder but sooner while we develop the perfect statistical model. I believe Chris is doing that now. So, whatever is wrong with his analysis, let's just wait and see how it turns out, and how we can improve further. For now, anything will be an improvement. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.zhogin at gmail.com Mon Jul 1 00:53:50 2013 From: andrew.zhogin at gmail.com (Andrew Zhogin) Date: Mon, 1 Jul 2013 14:53:50 +0700 Subject: [LLVMdev] IfConversion non-recursive patch. Message-ID: Hi. On our system we have a problems with recursive IfConversion algorithm. Here is the patch for making it loop-based. Or do I need to send it to some other mail-list? -- Best regards, Andrew Zhogin. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IfConversion.diff Type: application/octet-stream Size: 10990 bytes Desc: not available URL: From renato.golin at linaro.org Mon Jul 1 01:17:16 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 1 Jul 2013 09:17:16 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: On 1 July 2013 06:51, James Courtier-Dutton wrote: > Another option is to take a deterministic approach to measurement. The > code should executive the same cpu instructions every time it is run, so > some method to measure just these instructions should be attempted. Maybe > processing qemu logs when llvm is run inside qemu might give a possible > solution? > Hi James, This looks simpler on paper. First, no emulator will give you accurate cycle count, or accurate execution sequence, so it's virtually impossible (and practically irrelevant) to benchmark on models. Second, measuring "relevant code" is what a benchmark is all about, but instrumenting it (emulators, profilers, etc) to separate matters is making irrelevant what was not. A good benchmark can time just the relevant part and run it thousands/millions of times to improve accuracy. Ours are not all good benchmarks, most are not even benchmarks. We're timing the execution of programs, taking into account OS context switches, CPU schedulers, disk I/O and many other unpredictable (and unrelated) things. The scientific approach is to run multiple times and improve accuracy, but your accuracy will always be no more than half of the minimum measuring distance. So, if we don't increase the run time to make the minimum measuring distance irrelevant, no amount of statistics will give you more accuracy. As an example, I ran "hello world" on my laptop and on my chromebook. My laptop gives me 0.001s run time with the occasional 0.002s. My chromebook is never less than 0.010s with 0.012s being the average. That's start-up, libraries and OS interruptions time, mostly. Some of the benchmarks (http://llvm.org/perf/db_default/v4/nts/12944) take between 0.010s and 0.035s to run, which really means nothing at that level of noise. Anton, David and Chris are absolutely correct that smoothing the curve will give no real insight on the quality of the results, but it will filter out most false positives. But that's not enough, not even a decent statistical analysis. We need benchmarks to be what they're supposed to: benchmarks. If we know a test application is not suitable for benchmarking, stop timing it. If we want to time an application, isolate the hot paths, run them multiple times, etc. One of the original assumptions on the test-suite was to NOT change the applications, because it would be easier to just add a new version, if we ever did. I'm not sure that time saved is really paying off. It's my opinion that we do need to change the application and we do need a different approach to community benchmarks. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Tweed at arm.com Mon Jul 1 02:35:00 2013 From: David.Tweed at arm.com (David Tweed) Date: Mon, 1 Jul 2013 10:35:00 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: Just some general observations: Firstly, just to note that when I talk about looking at what statisticians have developed I’m not being snobbish, it’s that pretty much any methodology will show up big effects it’s getting the best “power” on small effects when you’ve got marginal sample sizes that’s tricky and where a lot of people have already spent a long time thinking about these things. On Jun 30, 2013 8:12 PM, "Anton Korobeynikov" > wrote: > > > Getting 10 samples at different commits will give you similar accuracy if > > behaviour doesn't change, and you can rely on 10-point blocks before and > after each change to have the same result. > Right. But this way you will have 10-commits delay. So, you will need > 3-4 additional test runs to pinpoint the offending commit in the worst > case. > > > This is why I proposed something like moving averages. > Moving average will "smooth" the result. So, only really big changes > will be caught by it. > Just to state the obvious, statistics is best able to detect small effects the fewer extraneous things you try to estimate precisely. So I don’t quite see why an appropriately robust change-point estimator isn’t what we’d like to use here. (Someone earlier in the thread suggested it wasn’t, but I didn’t follow why.) In such a case you can use the 2-3 results from several consecutive commits in the “before” region and 2-3 results from several consecutive results in the after region, which seems a reasonable fit for the experimental situation. (My objection to smoothing is just that it’s summarising data before using a statistical test for no good reason, not that tracking samples over a window seems problematic.) | Like any result in statistics, the result should be quoted together with a +/- figure derived from the statistical method used. Generally, low sample size means high +/-. “Yes, but...” ☺ That’s absolutely true, but even +/- figures can be overly optimistic/overly pessimistic depending how well the actual distributions in practice match the assumptions about the distributions implicit in the statistical test. (As you can probably tell, I’m heavily Bayesian and regard statistics as ways of coherently assigning numbers to your beliefs and assumptions, along with new data, so making assumptions – that are going to be re-examined as things progress -- is fine; objective, assumption-free statistics doesn’t really exist for me.) Cheers, Dave -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eirini_dit at windowslive.com Mon Jul 1 02:55:16 2013 From: eirini_dit at windowslive.com (Eirini _) Date: Mon, 1 Jul 2013 12:55:16 +0300 Subject: [LLVMdev] Problem with building llvm and running project Message-ID: Hello, i am new to LLVM and i want to create my own project with a cpp file which calls llvm functions and then run it. I download clang source, llvm source and compiler-rt source. I tried to configure and build llvm using this http://llvm.org/docs/GettingStarted.html#getting-started-with-llvm but it failed because .h files included on top of the Hello.cpp couldn't found. Can you tell me the instuctions about what i have to do in order to build llvm? do i need to create a new build folder? What do i have to set to Path variable? After building llvm i would like to run my own project, let's say Hello.cpp found in the /lib/Transforms folder on the llvm source. Thanks, Eirini -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Mon Jul 1 08:18:02 2013 From: tanmx_star at yeah.net (Star Tan) Date: Mon, 1 Jul 2013 23:18:02 +0800 (CST) Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <51CF7D1A.8020200@grosser.es> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> Message-ID: <38376141.5097.13f9acfe200.Coremail.tanmx_star@yeah.net> >> (3) About detecting scop regions in bottom-up order. >> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order. > >Sebastian had a patch for this. Does his patch improve the scop >detection time. LNT testing results for Sebastian's patch file can be seen on http://188.40.87.11:8000/db_default/v4/nts/recent_activity (Run Order: ScopDetect130615). You can compare ScopDetect130615 (Polly with Sebastian's patch) to pOpt130615 (Polly without Sebastian's patch). The result seems not show significant performance improvements with the bottom-up patch for LLVM test-suite benchmarks. You are right. I think I should better firstly focus on the some simple examples, such as oggenc, to understand where scop detection pass spend its time. After that, we can then investigate the scop detection order. Bests, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Mon Jul 1 08:47:06 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 01 Jul 2013 08:47:06 -0700 Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> Message-ID: <51D1A47A.1000801@grosser.es> On 07/01/2013 06:51 AM, Star Tan wrote: >> Great. Now we have two test cases we can work with. Can you > >> upload the LLVM-IR produced by clang -O0 (without Polly)? > Since tramp3d-v4.ll is to large (19M with 267 thousand lines), I would focus on the oggenc benchmark at firat. > I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which compressed into the file oggenc.tgz. Sounds good. >> 2) Check why the Polly scop detection is failing >> >> You can use 'opt -polly-detect -analyze' to see the most common reasons >> the scop detection failed. We should verify that we perform the most >> common and cheap tests early. >> > I also attached the output file oggenc_polly_detect_analyze.log produced by "polly-opt -O3 -polly-detect -analyze oggenc.ll". Unfortunately, it only dumps valid scop regions. At first, I thought to dump all debugging information by "-debug" option, but it will dump too many unrelated information produced by other passes. Do you know any option that allows me to dump debugging information for the "-polly-detect" pass, but at the same time disabling debugging information for other passes? I really propose to not attach such large files. ;-) To dump debug info of just one pass you can use -debug-only=polly-detect. However, for performance measurements, you want to use a release build to get accurate numbers. Another flag that is interesting is the flag '-stats'. It gives me the following information: 4 polly-detect - Number of bad regions for Scop: CFG too complex 183 polly-detect - Number of bad regions for Scop: Expression not affine 103 polly-detect - Number of bad regions for Scop: Found base address alias 167 polly-detect - Number of bad regions for Scop: Found invalid region entering edges 59 polly-detect - Number of bad regions for Scop: Function call with side effects appeared 725 polly-detect - Number of bad regions for Scop: Loop bounds can not be computed 93 polly-detect - Number of bad regions for Scop: Non canonical induction variable in loop 8 polly-detect - Number of bad regions for Scop: Others 53 polly-detect - Number of regions that a valid part of Scop This seems to suggest that we most scops fail due to loop bounds that can not be computed. It would be interesting to see what kind of expressions these are. In case SCEV often does not deliver a result, this may be one of the cases where bottom up scop detection would help a lot, as outer regions are automatically invalidated if we can not get a SCEV for the loop bounds of the inner regions. However, I still have the feeling the test case is too large. You can reduce it I propose to first run opt with 'opt -O3 -polly -disable-inlining -time-passes'. You then replace all function definitions with s/define internal/define/. After this preprocessing you can use a regexp such as "'<,'>s/define \([^{}]* \){\_[^{}]*}/declare \1" to replace function definitions with their declaration. You can use this to binary search for functions that have a large overhead in ScopDetect time. I tried this a little, but realized that no matter if I removed the first or the second part of a module, the relative scop-detect time always went down. This is surprising. If you see similar effects, it would be interesting to investigate. Cheers, tobi Cheers, Tobi From tobias at grosser.es Mon Jul 1 08:53:00 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 01 Jul 2013 08:53:00 -0700 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> Message-ID: <51D1A5DC.2000607@grosser.es> On 06/23/2013 11:12 PM, Star Tan wrote: > Hi all, > > > When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not? > > > I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED. However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT. > > > Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr? Hi Star Tan, I just attached you some hacks I tried on the week-end. The attached patch prints the confidence intervals in LNT. If you like you can take them as an inspiration (not directly copy) to print those values in your lnt server. (The patches require scipy and numpy being installed in your python sandbox. This should be OK for our experiments, but we probably do not want to reimplement those functions before upstreaming). Also, as Anton suggested. It may make sense to rerun your experiments with a larger number of samples. As the machine is currently not loaded and we do not track individual commits, 10 samples should probably be good enough. Cheers, Tobias -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-My-confidence-measurement-hacks.patch Type: text/x-diff Size: 8799 bytes Desc: not available URL: From tobias at grosser.es Mon Jul 1 08:58:06 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 01 Jul 2013 08:58:06 -0700 Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <38376141.5097.13f9acfe200.Coremail.tanmx_star@yeah.net> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> <38376141.5097.13f9acfe200.Coremail.tanmx_star@yeah.net> Message-ID: <51D1A70E.3080806@grosser.es> On 07/01/2013 08:18 AM, Star Tan wrote: >>> (3) About detecting scop regions in bottom-up order. > >>> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order. >> >> Sebastian had a patch for this. Does his patch improve the scop >> detection time. > > LNT testing results for Sebastian's patch file can be seen on http://188.40.87.11:8000/db_default/v4/nts/recent_activity (Run Order: ScopDetect130615). You can compare ScopDetect130615 (Polly with Sebastian's patch) to pOpt130615 (Polly without Sebastian's patch). The result seems not show significant performance improvements with the bottom-up patch for LLVM test-suite benchmarks. > You are right. I think I should better firstly focus on the some simple examples, such as oggenc, to understand where scop detection pass spend its time. After that, we can then investigate the scop detection order. This is actually surprising. I expected his patch to make a difference in the scop detect time. Btw, if you rerun the experiments, it probably makes more sense to not modify the commit number, but instead use different lnt tester names to report the changes. Changing the commit numbers causes crashes e.g. when you click on the links for individual benchmarks. Cheers, Tobias From renato.golin at linaro.org Mon Jul 1 09:41:07 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 1 Jul 2013 17:41:07 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: On 1 July 2013 02:02, Chris Matthews wrote: > One thing that LNT is doing to help “smooth” the results for you is by > presenting the min of the data at a particular revision, which (hopefully) > is approximating the actual runtime without noise. > That's an interesting idea, as you said, if you run multiple times on every revision. On ARM, every run takes *at least* 1h, other architectures might be a lot worse. It'd be very important on those architectures if you could extract point information from group data, and min doesn't fit in that model. You could take min from a group of runs, but again, that's no different than moving averages. Though, "moving mins" might make more sense than "moving averages" for the reasons you exposed. Also, on tests that take as long as noise to run (0.010s or less on A15), the minimum is not relevant, since runtime will flatten everything under 0.010 onto 0.010, making your test always report 0.010, even when there are regressions. I really cannot see how you can statistically enhance data in a scenario where the measuring rod is larger than the signal. We need to change the wannabe-benchmarks to behave like proper benchmarks, and move everything else into "Applications" for correctness and specifically NOT time them. Less is more. That works well with a lot of samples per revision, but not for across > revisions, where we really need the smoothing. One way to explore this is > to turn > I was really looking forward to that hear the end of that sentence... ;) We also lack any way to coordinate or annotate regressions, that is a whole > separate problem though. > Yup. I'm having visions of tag clouds, bugzilla integration, cross architectural regression detection, etc. But I'll ignore that for now, let's solve one big problem at a time. ;) cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Mon Jul 1 10:10:25 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 01 Jul 2013 12:10:25 -0500 Subject: [LLVMdev] MI Scheduler vs SD Scheduler? In-Reply-To: <1372448337.69444.YahooMailNeo@web125505.mail.ne1.yahoo.com> References: <1372448337.69444.YahooMailNeo@web125505.mail.ne1.yahoo.com> Message-ID: Sent from my iPhone On Jun 28, 2013, at 2:38 PM, Ghassan Shobaki wrote: > Hi, > > We are currently in the process of upgrading from LLVM 2.9 to LLVM 3.3. We are working on instruction scheduling (mainly for register pressure reduction). I have been following the llvmdev mailing list and have learned that a machine instruction (MI) scheduler has been implemented to replace (or work with?) the selection DAG (SD) scheduler. However, I could not find any document that describes the new MI scheduler and how it differs from and relates to the SD scheduler. MI is now the place to implement any heuristics for profitable scheduling. SD scheduler will be directly replaced by a new pass that orders the DAG as close as it can to IR order. We currently emulate this with -pre-RA-sched=source. The only thing necessarily different about MI sched is that it runs after reg coalescing and before reg alloc, and maintains live interval analysis. As a result, register pressure tracking is more accurate. It also uses a new target interface for precise register pressure. MI sched is intended to be a convenient place to implement target specific scheduling. There is a generic implementation that uses standard heuristics to reduce register pressure and balance latency and CPU resources. That is what you currently get when you enable MI sched for x86. The generic heuristics are implemented as a priority function that makes a greedy choice over the ready instructions based on the current pressure and the resources and latency of the scheduled and unscheduled set of instructions. An DAG subtree analysis also exists (ScheduleDFS), which can be used for register pressure avoidance. This isn't hooked up to the generic heuristics yet for lack of interesting test cases. > So, I would appreciate any pointer to a document (or a blog) that may help us understand the difference and the relation between the two schedulers and figure out how to deal with them. We are trying to answer the following questions: > > - A comment at the top of the file ScheduleDAGInstrs says that this file implements re-scheduling of machine instructions. So, what does re-scheduling mean? Rescheduling just means optional scheduling. That's really what the comment should say. It's important to know that MI sched can be skipped for faster compilation. > Does it mean that the real scheduling algorithms (such as reg pressure reduction) are currently implemented in the SD scheduler, while the MI scheduler does some kind of complementary work (fine tuning) at a lower level representation of the code? > And what's the future plan? Is it to move the real scheduling algorithms into the MI scheduler and get rid of the SD scheduler? Will that happen in 3.4 or later? I would like to get rid of the SD scheduler so we can reduce compile time by streamline the scheduling data structures and interfaces. There may be some objection to doing that in 3.4 if projects haven't been able to migrate. It will be deprecated though. > > - Based on our initial investigation of the default behavior at -O3 on x86-64, it appears that the SD scheduler is called while the MI scheduler is not. That's consistent with the above interpretation of re-scheduling, but I'd appreciate any advice on what we should do at this point. Should we integrate our work (an alternate register pressure reduction scheduler) into the SD scheduler or the MI scheduler? Please refer to my recent messages on llvmdev regarding enabling MI scheduling by default on x86. http://article.gmane.org/gmane.comp.compilers.llvm.devel/63242/match=machinescheduler I suggest integrating with the MachineScheduler pass. There are many places to plug in. MachineSchedRegistry provides the hook. At that point you can define your own ScheduleDAGInstrs or ScheduleDAGMI subclass. People who only want to define new heuristics should reuse ScheduleDAGMI directly and only define their own MachineSchedStrategy. > > - Our SPEC testing on x86-64 has shown a significant performance improvement of LLVM 3.3 relative to LLVM 2.9 (about 5% in geomean on INT2006 and 15% in geomean on FP2006), but our spill code measurements have shown that LLVM 3.3 generates significantly more spill code on most benchmarks. We will be doing more investigation on this, but are there any known facts that explain this behavior? Is this caused by a known regression in scheduling and/or allocation (which I doubt) or by the implementation (or enabling) of some new optimization(s) that naturally increase(s) register pressure? > There is not a particular known regression. It's not surprising that optimizations increase pressure. Andy > Thank you in advance! > > Ghassan Shobaki > Assistant Professor > Department of Computer Science > Princess Sumaya University for Technology > Amman, Jordan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stoklund at 2pi.dk Mon Jul 1 11:13:09 2013 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Mon, 01 Jul 2013 11:13:09 -0700 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: On Jun 30, 2013, at 6:02 PM, Chris Matthews wrote: > This is probably another area where a bit of dynamic behavior could help. When we find a regressions, kick off some runs to bisect back to where it manifests. This is what we would be doing manually anyway. We could just search back with the set of regressing benchmarks, meaning the whole suite does not have to be run (unless it is a global regression). > > There are situations where we see commit which make things slower then faster again, but so far those seem to be from experimental features being switched on then off. This is an interesting paper: http://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf "However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects." "We find that the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.” Thanks, /jakob From qcolombet at apple.com Mon Jul 1 11:30:30 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 01 Jul 2013 11:30:30 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? Message-ID: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Hi, ** Problematic ** I am looking for advices to share some logic between DAG combine and target lowering. Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. Let me know if there is another, better supported, approach for this kind of problems. ** Motivating Example ** The motivating example comes form the lowering of vector code on armv7. More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. Attached motivating_example.ll shows such a case: llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - ldr r0, [r1] ldr r1, [r2] vmov s1, r1 vmov s0, r0 Here each ldr, vmov sequences could have been replaced by a simple vld1.32. ** Proposed Solution ** Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). Thanks for your help. Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ARMISelLowering.patch Type: application/octet-stream Size: 1288 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: motivating_example.ll Type: application/octet-stream Size: 1114 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Mon Jul 1 11:52:03 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 1 Jul 2013 11:52:03 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet wrote: > Hi, > > ** Problematic ** > I am looking for advices to share some logic between DAG combine and > target lowering. > > Basically, I need to know if a bitcast that is about to be inserted during > target specific isel lowering will be eliminated during DAG combine. > > Let me know if there is another, better supported, approach for this kind > of problems. > > ** Motivating Example ** > The motivating example comes form the lowering of vector code on armv7. > More specifically, the build_vector node is lowered to a target specific > ARMISD::build_vector where all the parameters are bitcasted to floating > point types. > > This works well, unless the inserted bitcasts survive until instruction > selection. In that case, they incur moves between integer unit and floating > point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of > insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to > know what DAG combine will do (see TODO). > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.albuschat at gmail.com Mon Jul 1 12:03:50 2013 From: d.albuschat at gmail.com (Daniel Albuschat) Date: Mon, 1 Jul 2013 21:03:50 +0200 Subject: [LLVMdev] Implementing closures and continuations In-Reply-To: <8F635F6A-56F9-482B-9FCB-CC7627F1D3D6@icloud.com> References: <98865B9A-9C21-477E-B2A7-65312C914B9C@icloud.com> <8F635F6A-56F9-482B-9FCB-CC7627F1D3D6@icloud.com> Message-ID: 2013/6/30 David Farler > > > On Jun 29, 2013, at 9:53 PM, Eli Friedman wrote: > > On Sat, Jun 29, 2013 at 7:51 PM, David Farler wrote: > >> Hi all, >> >> In getting to know the LLVM infrastructure, I'm having a hard time >> finding implementation details for closures and continuations. >> >> For closures, I've read comments such as "using a struct" as an >> environment to hold references to free variables, linked lists to >> dictionaries for various scope levels, and even things like "it's just like >> virtual methods". I have a couple of questions regarding codegen, >> especially in the context of a Lispy/Haskell-like language with automatic >> reference counting of immutable objects allocated on the heap. >> >> What needs to be added to Functions during code generation? Is it a >> really just a struct only holding offsets to free variables? Or do symbols >> need to be looked up a kind of scope chain at runtime? >> >> Does adding a JIT complicate codegen of closures in terms of symbol >> lookup in the bundled environment? >> >> Any recommendations for treating functions with no free variables vs >> closures? >> >> Does implementing continuations greatly affect implementations of >> closures? I assume the stack would need to be heap-allocated explicitly, >> which might affect how pointers would be saved in a bundled environment. >> > > Please read http://llvm.org/docs/tutorial/index.html . I'm having > trouble coming up with answers to most of your questions. > > If you need completely general continuations, yes, you'll want to > explicitly allocate your stack. > > -Eli > > > Thanks, this is the tutorial I've been reading and it's a great resource. > I think I just need to start looking at some languages that already > implement this. I can see now that free variable addresses are known at > compile time and can be used in the future provided they are allocated on > or moved to the heap. > > Still wondering about how to pack the structs and how eval/JIT affects > this; I'll keep looking. > Hello David, is your project somewhere up on github or so? I'm implementing something that is relatively similar (at this point), so it would be nice if I could have a look at your code. As I'm still in the "learning LLVM IR"-phase, I have not much code put up, but if you like I can post it as soon as something is available. Greetings, Daniel Albuschat -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Mon Jul 1 12:07:46 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 01 Jul 2013 12:07:46 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: <8A644B21-71D9-42B3-94C2-C6444490F35B@apple.com> On Jul 1, 2013, at 11:52 AM, Eli Friedman wrote: > On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet wrote: > Hi, > > ** Problematic ** > I am looking for advices to share some logic between DAG combine and target lowering. > > Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. > > Let me know if there is another, better supported, approach for this kind of problems. > > ** Motivating Example ** > The motivating example comes form the lowering of vector code on armv7. > More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. > > This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). > > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. Thanks Eli. I will try that approach. -Quentin > > -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From Don.Hinton at pimco.com Mon Jul 1 12:21:34 2013 From: Don.Hinton at pimco.com (Hinton, Don) Date: Mon, 1 Jul 2013 19:21:34 +0000 Subject: [LLVMdev] Build problem with nonstandard lib directory In-Reply-To: References: Message-ID: <1FFE4103C5AB804490642E9EEDD7C23D26AF8238@INE001P03.PIMCO.IMSWEST.SSCIMS.com> Hi Andy: Please add this to your configure invocation: --with-gcc-toolchain=/depot/gcc-4.5.2 hth... don From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Andy Jost Sent: Sunday, June 30, 2013 6:12 PM To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Build problem with nonstandard lib directory I'm having trouble building LLVM 3.3 on RHEL 2.6.9-89.ELlargesmp x86_64. This is most likely a problem with my not knowing the advanced usage of the configure script, but I'm stuck just the same. I hope this list is an appropriate place for this question. The system I'm working on is a bit unusual. It is a shared server not administered by me (I can't change the configuration), and the "normal" system directories like /usr/include, /usr/lib and so on contain mostly very old and outdated things. For instance, /usr/bin/g++ is gcc version 3.4.6. I access newer versions though a non-standard directory, e.g., /depot/gcc-4.5.2/bin/g++ is gcc version 4.5.2 (the version I want to use). I've checked out the LLVM 3.3 code, and I try to configure and build it like this: ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd`/bin gmake -j12 After several step succeed, I eventually get the following error: /build/Debug+Asserts/bin/llvm-tblgen: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.5' not found (required by /build/Debug+Asserts/bin/llvm-tblgen) I know this error. It occurs when the default /usr/lib64 path is used for dynamic loading instead of the non-standard /depot/gcc-4.5.2/lib64. Normally, I need to set LD_LIBRARY_PATH on this system. I confirmed that with LD_LIBRARY_PATH set, I can invoke llvm-tblgen, and the without that variable set, I get exactly the error shown. Also, LD_LIBRARY_PATH is correctly set when configure is called, and when gmake is called. So how can I adjust the build process? I suspect there is an option I can pass to configure, but I haven't been able to figure out what it would be. For example, I tried adding LD_LIBRARY_PATH=$LD_LIBRARY_PATH to the configure command line to no avail. Any help would be greatly appreciated! -Andy This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Mon Jul 1 12:26:09 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 1 Jul 2013 20:26:09 +0100 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: On 1 July 2013 19:13, Jakob Stoklund Olesen wrote: > "We find that the performance impact of -O3 over -O2 optimizations is > indistinguishable from random noise.” > Yes, we circulated that paper a few weeks ago on the list. This reminds me of an interesting project I saw that runs a genetic algorithm on the LLVM optimization passes using opt and you can get some interesting results from them, that would help you distinguish O2 from O3, but on a test-by-test basis. There's no way yet to know beforehand what is the best combination of passes for each specific input. That is, if I'm not mistaken, one of the reasons to enable the loop vectorizer on -O2 as -Os by default, as well as -O3, on LLVM. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Mon Jul 1 13:33:29 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 01 Jul 2013 13:33:29 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: On Jul 1, 2013, at 11:52 AM, Eli Friedman wrote: > On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet wrote: > Hi, > > ** Problematic ** > I am looking for advices to share some logic between DAG combine and target lowering. > > Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. > > Let me know if there is another, better supported, approach for this kind of problems. > > ** Motivating Example ** > The motivating example comes form the lowering of vector code on armv7. > More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. > > This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). > > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. Hi Eli, I have started to look into the direction you gave me. I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). What did I miss? Thanks, -Quentin > > -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Mon Jul 1 13:45:12 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 1 Jul 2013 13:45:12 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet wrote: > On Jul 1, 2013, at 11:52 AM, Eli Friedman wrote: > > On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet > wrote: > >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and >> target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted >> during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind >> of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific >> ARMISD::build_vector where all the parameters are bitcasted to floating >> point types. >> >> This works well, unless the inserted bitcasts survive until instruction >> selection. In that case, they incur moves between integer unit and floating >> point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple >> vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of >> insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to >> know what DAG combine will do (see TODO). >> > > I think you're approaching this backwards: the obvious thing to do is to > generate the insert_vector_elt sequence unconditionally, and DAGCombine > that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction > solves the issue. Indeed to be able to DAGCombine a insert_vector_elt > sequences into a ARMISD::build_vector, I still need to know if it would be > profitable, i.e., if DAGCombine will remove the bitcasts that > combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not > have access to more DAGCombine logic (at least DAGCombineInfo is not > providing the require information). > > What did I miss? > > Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.) -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Mon Jul 1 13:58:07 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 01 Jul 2013 13:58:07 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: <9FD84595-EAA7-435E-8598-10599D69238C@apple.com> On Jul 1, 2013, at 1:45 PM, Eli Friedman wrote: > On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet wrote: > On Jul 1, 2013, at 11:52 AM, Eli Friedman wrote: > >> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet wrote: >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. >> >> This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). >> >> I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). > > What did I miss? > > > Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. > > You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.) Make sense. I will try that. Thanks! -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From silvas at purdue.edu Mon Jul 1 14:21:58 2013 From: silvas at purdue.edu (Sean Silva) Date: Mon, 1 Jul 2013 14:21:58 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? Message-ID: tl;dr If there are no objections I'd like to change clang-format's LLVM style to always put `template <...>` on its own line. I think it's a general code-layout consistency win and avoids some cases where trivial code changes result in significant formatting differences (see the last example). Examples of the current behavior: -------------- template class ELFState { clang-format's to: template class ELFState { -------------- -------------- template static size_t vectorDataSize(const std::vector &Vec) { clang-format's to: template static size_t vectorDataSize(const std::vector &Vec) { -------------- Pathological example: -------------- template template Foo::bar(U Uv) { for (unsigned i = 0, e = Uv.size(); i != e; ++i) use(i); } clang-format's to: template template Foo::bar(U Uv) { for (unsigned i = 0, e = Uv.size(); i != e; ++i) use(i); } while the completely minor modification s/int/unsigned/ results (due to line length restrictions) in clang-format agreeing with the former layout: template template Foo::bar(U Uv) { for (unsigned i = 0, e = Uv.size(); i != e; ++i) use(i); } -------------- I would prefer that two pieces of code with similar logical structure to be similarly formatted, as much as reasonable, and I think that always breaking on template declarations is reasonable. clang-format has an option for this (`AlwaysBreakTemplateDeclarations`), but we don't have it enabled currently for LLVM style. Daniel Jasper informs me that the current setting is a carry-over from before the setting was introduced, where it was effectively always "false" (the current setting for LLVM style). I hate to bring up such a microscopic issue, but I find myself manually fixing clang-format's behavior with LLVM style when it comes to putting `template <...>` on its own line, since TBH I feel like a reviewer would ask me to change it. At a basic level I think what bothers me about it is that it breaks useful invariants while reading code: - A class or struct declaration always has `class` or `struct` as the first non-whitespace character on the line (also helpful when grepping). - When reading code, you can seek past the template by just going down to the next line starting at the same indentation. If the `template <...>` gets put on the same line, then you have to "parse through it" to find what is actually being declared. - There is a single way to lay out `template <...>`. With the current setting, clang-format will still put the `template <...>` on its own line in a lot of cases due to line length restrictions, but in others it will be put onto the same line. Hence in some cases you can "skip down past it" and in others you have to "parse through it", but worst of all you have to detect which it is when reading the code. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Mon Jul 1 14:31:45 2013 From: dblaikie at gmail.com (David Blaikie) Date: Mon, 1 Jul 2013 14:31:45 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 2:21 PM, Sean Silva wrote: > tl;dr If there are no objections I'd like to change clang-format's LLVM > style to always put `template <...>` on its own line. I think it's a general > code-layout consistency win and avoids some cases where trivial code changes > result in significant formatting differences (see the last example). > > Examples of the current behavior: > > -------------- > template > class ELFState { > > clang-format's to: > > template class ELFState { > -------------- > > -------------- > template > static size_t vectorDataSize(const std::vector &Vec) { > > clang-format's to: > > template static size_t vectorDataSize(const std::vector &Vec) { > -------------- > > Pathological example: > -------------- > template > template > Foo::bar(U Uv) { > for (unsigned i = 0, e = Uv.size(); i != e; ++i) > use(i); > } > > clang-format's to: > > template template Foo::bar(U > Uv) { > for (unsigned i = 0, e = Uv.size(); i != e; ++i) > use(i); > } > > while the completely minor modification s/int/unsigned/ results (due to line > length restrictions) in clang-format agreeing with the former layout: > > template > template > Foo::bar(U Uv) { > for (unsigned i = 0, e = Uv.size(); i != e; ++i) > use(i); > } > -------------- > > I would prefer that two pieces of code with similar logical structure to be > similarly formatted, as much as reasonable, and I think that always breaking > on template declarations is reasonable. > > clang-format has an option for this (`AlwaysBreakTemplateDeclarations`), but > we don't have it enabled currently for LLVM style. Daniel Jasper informs me > that the current setting is a carry-over from before the setting was > introduced, where it was effectively always "false" (the current setting for > LLVM style). > > I hate to bring up such a microscopic issue, but I find myself manually > fixing clang-format's behavior with LLVM style when it comes to putting > `template <...>` on its own line, since TBH I feel like a reviewer would ask > me to change it. > > At a basic level I think what bothers me about it is that it breaks useful > invariants while reading code: > - A class or struct declaration always has `class` or `struct` as the first > non-whitespace character on the line (also helpful when grepping). > - When reading code, you can seek past the template by just going down to > the next line starting at the same indentation. If the `template <...>` gets > put on the same line, then you have to "parse through it" to find what is > actually being declared. > - There is a single way to lay out `template <...>`. With the current > setting, clang-format will still put the `template <...>` on its own line in > a lot of cases due to line length restrictions, but in others it will be put > onto the same line. Hence in some cases you can "skip down past it" and in > others you have to "parse through it", but worst of all you have to detect > which it is when reading the code. Have you got any statistics for the current state of LLVM with respect to this formatting issue? If something is already the overwhelmingly common style (& it's not a case where it used to be the style, the style has been updated, and nothing has been migrated yet) then just make clang-format agree with reality - this doesn't require a discussion or bikeshed. From silvas at purdue.edu Mon Jul 1 14:38:08 2013 From: silvas at purdue.edu (Sean Silva) Date: Mon, 1 Jul 2013 14:38:08 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 2:31 PM, David Blaikie wrote: > > > Have you got any statistics for the current state of LLVM with respect > to this formatting issue? If something is already the overwhelmingly > common style (& it's not a case where it used to be the style, the > style has been updated, and nothing has been migrated yet) then just > make clang-format agree with reality - this doesn't require a > discussion or bikeshed. > It's not overwhelming, but the preponderance seems to be towards putting it on its own line (the exceptions are usually small trait specializations like isPodLike). I give some rough numbers here < http://thread.gmane.org/gmane.comp.compilers.llvm.devel/63378> (and see Daniel's reply). Daniel is open to changing it, but asked me to gather some more opinions. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Mon Jul 1 14:40:39 2013 From: dblaikie at gmail.com (David Blaikie) Date: Mon, 1 Jul 2013 14:40:39 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 2:38 PM, Sean Silva wrote: > > > > On Mon, Jul 1, 2013 at 2:31 PM, David Blaikie wrote: >> >> >> Have you got any statistics for the current state of LLVM with respect >> to this formatting issue? If something is already the overwhelmingly >> common style (& it's not a case where it used to be the style, the >> style has been updated, and nothing has been migrated yet) then just >> make clang-format agree with reality - this doesn't require a >> discussion or bikeshed. > > > It's not overwhelming, but the preponderance seems to be towards putting it > on its own line (the exceptions are usually small trait specializations like > isPodLike). I give some rough numbers here > Fair enough - could we draw any further stylistic conclusions that could motivate clang-format? If the entire definition of the template fits on one line is it pretty consistent that it's defined on the one line rather than split? What about template declarations, if any? > (and see > Daniel's reply). Daniel is open to changing it, but asked me to gather some > more opinions. > > -- Sean Silva From silvas at purdue.edu Mon Jul 1 15:41:25 2013 From: silvas at purdue.edu (Sean Silva) Date: Mon, 1 Jul 2013 15:41:25 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 2:40 PM, David Blaikie wrote: > On Mon, Jul 1, 2013 at 2:38 PM, Sean Silva wrote: > > > > > > > > On Mon, Jul 1, 2013 at 2:31 PM, David Blaikie > wrote: > >> > >> > >> Have you got any statistics for the current state of LLVM with respect > >> to this formatting issue? If something is already the overwhelmingly > >> common style (& it's not a case where it used to be the style, the > >> style has been updated, and nothing has been migrated yet) then just > >> make clang-format agree with reality - this doesn't require a > >> discussion or bikeshed. > > > > > > It's not overwhelming, but the preponderance seems to be towards putting > it > > on its own line (the exceptions are usually small trait specializations > like > > isPodLike). I give some rough numbers here > > > > Fair enough - could we draw any further stylistic conclusions that > could motivate clang-format? If the entire definition of the template > fits on one line is it pretty consistent that it's defined on the one > line rather than split? What about template declarations, if any? > > As a rough count, there are at least "hundreds" of cases where it changes previously existing template definitions onto one line (i.e., they would fit on one line but they weren't put on one line); this is more than the total number of one-line definitions. To obtain a lower bound on the cited "hundreds", I clang-format'd everything and then looked for just diff chunks similar to: -template -class ImmutableList { +template class ImmutableList { $ cd llvm/ $ clang-format -i **/*.cpp **/*.h $ git diff | grep -B2 '^+template' | egrep -B1 '^-(struct|class)' | grep '^-template' | wc -l 287 # For comparison $ git grep '^\s*template' -- '*.cpp' '*.h' | wc -l 2011 $ cd clang/ $ clang-format -i **/*.cpp **/*.h $ git diff | grep -B2 '^+template' | egrep -B1 '^-(struct|class)' | grep '^-template' | wc -l 396 # For comparison $ git grep '^\s*template' -- '*.cpp' '*.h' | wc -l 6713 Outside of clang's test/ directory, there are a really tiny number of one-line template definitions in clang: $ cd clang/ $ git grep -E '^\s*template.*(class|struct).*{' -- lib include | wc -l 60 My general feel is that template declarations are usually one-lined in existing code, but it seems that it is about half and half: $ git grep -E '^ *template *<[^>]*> *(class|struct) [A-Za-z0-9_]+;' | wc -l 78 $ git grep -A1 -E '^ *template' | egrep -- '- *(struct|class) [A-Za-z0-9_]+;' | wc -l 72 -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrew.Jost at synopsys.com Mon Jul 1 15:51:09 2013 From: Andrew.Jost at synopsys.com (Andy Jost) Date: Mon, 1 Jul 2013 22:51:09 +0000 Subject: [LLVMdev] Build problem with nonstandard lib directory In-Reply-To: <1FFE4103C5AB804490642E9EEDD7C23D26AF8238@INE001P03.PIMCO.IMSWEST.SSCIMS.com> References: <1FFE4103C5AB804490642E9EEDD7C23D26AF8238@INE001P03.PIMCO.IMSWEST.SSCIMS.com> Message-ID: Thanks, Don. This change had no effect. Anything else I could try? 174 15:47 mkdir build 175 15:47 cd build/ 176 15:47 ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd` --with-gcc-toolchain=/depot/gcc-4.5.2/ 177 15:49 gmake -j12 From: Hinton, Don [mailto:Don.Hinton at pimco.com] Sent: Monday, July 01, 2013 12:22 PM To: 'Andy Jost'; LLVMdev at cs.uiuc.edu Subject: RE: Build problem with nonstandard lib directory Hi Andy: Please add this to your configure invocation: --with-gcc-toolchain=/depot/gcc-4.5.2 hth... don From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Andy Jost Sent: Sunday, June 30, 2013 6:12 PM To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Build problem with nonstandard lib directory I'm having trouble building LLVM 3.3 on RHEL 2.6.9-89.ELlargesmp x86_64. This is most likely a problem with my not knowing the advanced usage of the configure script, but I'm stuck just the same. I hope this list is an appropriate place for this question. The system I'm working on is a bit unusual. It is a shared server not administered by me (I can't change the configuration), and the "normal" system directories like /usr/include, /usr/lib and so on contain mostly very old and outdated things. For instance, /usr/bin/g++ is gcc version 3.4.6. I access newer versions though a non-standard directory, e.g., /depot/gcc-4.5.2/bin/g++ is gcc version 4.5.2 (the version I want to use). I've checked out the LLVM 3.3 code, and I try to configure and build it like this: ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd`/bin gmake -j12 After several step succeed, I eventually get the following error: /build/Debug+Asserts/bin/llvm-tblgen: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.5' not found (required by /build/Debug+Asserts/bin/llvm-tblgen) I know this error. It occurs when the default /usr/lib64 path is used for dynamic loading instead of the non-standard /depot/gcc-4.5.2/lib64. Normally, I need to set LD_LIBRARY_PATH on this system. I confirmed that with LD_LIBRARY_PATH set, I can invoke llvm-tblgen, and the without that variable set, I get exactly the error shown. Also, LD_LIBRARY_PATH is correctly set when configure is called, and when gmake is called. So how can I adjust the build process? I suspect there is an option I can pass to configure, but I haven't been able to figure out what it would be. For example, I tried adding LD_LIBRARY_PATH=$LD_LIBRARY_PATH to the configure command line to no avail. Any help would be greatly appreciated! -Andy This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Mon Jul 1 15:52:40 2013 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 1 Jul 2013 15:52:40 -0700 Subject: [LLVMdev] [cfe-dev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 2:21 PM, Sean Silva wrote: > tl;dr If there are no objections I'd like to change clang-format's LLVM > style to always put `template <...>` on its own line. > I would not like this change. > I think it's a general code-layout consistency win and avoids some cases > where trivial code changes result in significant formatting differences > (see the last example). > There are innumerable other such cases. I don't know that this in and of itself is an important goal. Personally, I favor compactness and brevity of reading the code slightly over this. I'm sure others will have different opinions and valuations. I'm not sure there is a significant consensus behind this particular issue. I would prefer that two pieces of code with similar logical structure to be > similarly formatted, as much as reasonable, and I think that always > breaking on template declarations is reasonable. > It's all in the trade offs you're willing to make. There are several things in clang-format today that will attempt a significantly different layout of code in order to minimize the number of lines required even though it only works when things happen to fit within 80 columns. I think those are still worth having because often, things fit within 80 columns! =D > I hate to bring up such a microscopic issue, but I find myself manually > fixing clang-format's behavior with LLVM style when it comes to putting > `template <...>` on its own line, since TBH I feel like a reviewer would > ask me to change it. > I, as a reviewer, would not ask you to change it. I can't recall i single review where this has come up. I don't think this alone is a good motivation. I'll point out that while you give an extreme example in one direction, there are extreme examples in the other direction as well: template template struct my_trait_a : true_type {}; template <> template <> struct my_trait_a<1, 2> : false_type {}; template <> template <> struct my_trait_a<2, 3> : false_type {}; template <> template <> struct my_trait_a<3, 4> : false_type {}; template <> template <> struct my_trait_a<4, 5> : false_type {}; I don't really relish these constructs consuming 3x the number of lines. Both of these are extreme cases, and they trade off differently. I think for the common case neither solution is bad, and the current behavior consumes slightly fewer lines of code. I think that's a reasonable stance and would vote to keep it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Mon Jul 1 15:55:09 2013 From: dblaikie at gmail.com (David Blaikie) Date: Mon, 1 Jul 2013 15:55:09 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 3:41 PM, Sean Silva wrote: > > > > On Mon, Jul 1, 2013 at 2:40 PM, David Blaikie wrote: >> >> On Mon, Jul 1, 2013 at 2:38 PM, Sean Silva wrote: >> > >> > >> > >> > On Mon, Jul 1, 2013 at 2:31 PM, David Blaikie >> > wrote: >> >> >> >> >> >> Have you got any statistics for the current state of LLVM with respect >> >> to this formatting issue? If something is already the overwhelmingly >> >> common style (& it's not a case where it used to be the style, the >> >> style has been updated, and nothing has been migrated yet) then just >> >> make clang-format agree with reality - this doesn't require a >> >> discussion or bikeshed. >> > >> > >> > It's not overwhelming, but the preponderance seems to be towards putting >> > it >> > on its own line (the exceptions are usually small trait specializations >> > like >> > isPodLike). I give some rough numbers here >> > >> >> Fair enough - could we draw any further stylistic conclusions that >> could motivate clang-format? If the entire definition of the template >> fits on one line is it pretty consistent that it's defined on the one >> line rather than split? What about template declarations, if any? >> > > As a rough count, there are at least "hundreds" of cases where it changes > previously existing template definitions onto one line (i.e., they would fit > on one line but they weren't put on one line); this is more than the total > number of one-line definitions. No doubt - I'm just curious whether there's some internal logic/consistency that we've not discussed. If you're claiming that they are essentially "random" in the choice of wrap or no-wrap (in the cases that would be affected by clang-format if code were reformatted with it & clang-format chose one or the other as the default) then, sure, flip a coin & go with it. > To obtain a lower bound on the cited > "hundreds", I clang-format'd everything and then looked for just diff chunks > similar to: > > -template > -class ImmutableList { > +template class ImmutableList { > > $ cd llvm/ > $ clang-format -i **/*.cpp **/*.h > $ git diff | grep -B2 '^+template' | egrep -B1 '^-(struct|class)' | grep > '^-template' | wc -l > 287 > # For comparison > $ git grep '^\s*template' -- '*.cpp' '*.h' | wc -l > 2011 > > $ cd clang/ > $ clang-format -i **/*.cpp **/*.h > $ git diff | grep -B2 '^+template' | egrep -B1 '^-(struct|class)' | grep > '^-template' | wc -l > 396 > # For comparison > $ git grep '^\s*template' -- '*.cpp' '*.h' | wc -l > 6713 > > Outside of clang's test/ directory, there are a really tiny number of > one-line template definitions in clang: > $ cd clang/ > $ git grep -E '^\s*template.*(class|struct).*{' -- lib include | wc -l > 60 Yeah, none of the style choices should include evidence from test code - we write it completely differently. > My general feel is that template declarations are usually one-lined in > existing code, but it seems that it is about half and half: > $ git grep -E '^ *template *<[^>]*> *(class|struct) [A-Za-z0-9_]+;' | wc -l > 78 > $ git grep -A1 -E '^ *template' | egrep -- '- *(struct|class) > [A-Za-z0-9_]+;' | wc -l > 72 Again, precluding test code, are there any discernable differences between cases that are one line versus multiline? (some consistent choice being made that could be enshrined in clang-format (or even a consistent choice that is beyond the understanding of clang-format - that's still helpful to know)) From Andrew.Jost at synopsys.com Mon Jul 1 16:28:19 2013 From: Andrew.Jost at synopsys.com (Andy Jost) Date: Mon, 1 Jul 2013 23:28:19 +0000 Subject: [LLVMdev] Build problem with nonstandard lib directory References: <1FFE4103C5AB804490642E9EEDD7C23D26AF8238@INE001P03.PIMCO.IMSWEST.SSCIMS.com> Message-ID: I eventually found a solution to this problem, which I'll post here in case anyone ever has the same trouble. The solution was to add the -rpath linker option as follows: ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd` LDFLAGS=-Wl,-rpath=/depot/gcc-4.5.2/lib64 I got the idea from http://stackoverflow.com/questions/1952146/glibcxx-3-4-9-not-found -Andy From: Andy Jost Sent: Monday, July 01, 2013 3:51 PM To: 'Hinton, Don'; 'Andy Jost'; LLVMdev at cs.uiuc.edu Subject: RE: Build problem with nonstandard lib directory Thanks, Don. This change had no effect. Anything else I could try? 174 15:47 mkdir build 175 15:47 cd build/ 176 15:47 ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd` --with-gcc-toolchain=/depot/gcc-4.5.2/ 177 15:49 gmake -j12 From: Hinton, Don [mailto:Don.Hinton at pimco.com] Sent: Monday, July 01, 2013 12:22 PM To: 'Andy Jost'; LLVMdev at cs.uiuc.edu Subject: RE: Build problem with nonstandard lib directory Hi Andy: Please add this to your configure invocation: --with-gcc-toolchain=/depot/gcc-4.5.2 hth... don From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Andy Jost Sent: Sunday, June 30, 2013 6:12 PM To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Build problem with nonstandard lib directory I'm having trouble building LLVM 3.3 on RHEL 2.6.9-89.ELlargesmp x86_64. This is most likely a problem with my not knowing the advanced usage of the configure script, but I'm stuck just the same. I hope this list is an appropriate place for this question. The system I'm working on is a bit unusual. It is a shared server not administered by me (I can't change the configuration), and the "normal" system directories like /usr/include, /usr/lib and so on contain mostly very old and outdated things. For instance, /usr/bin/g++ is gcc version 3.4.6. I access newer versions though a non-standard directory, e.g., /depot/gcc-4.5.2/bin/g++ is gcc version 4.5.2 (the version I want to use). I've checked out the LLVM 3.3 code, and I try to configure and build it like this: ../src/llvm-3.3/configure --enable-doxygen --enable-jit --prefix=`pwd`/bin gmake -j12 After several step succeed, I eventually get the following error: /build/Debug+Asserts/bin/llvm-tblgen: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.5' not found (required by /build/Debug+Asserts/bin/llvm-tblgen) I know this error. It occurs when the default /usr/lib64 path is used for dynamic loading instead of the non-standard /depot/gcc-4.5.2/lib64. Normally, I need to set LD_LIBRARY_PATH on this system. I confirmed that with LD_LIBRARY_PATH set, I can invoke llvm-tblgen, and the without that variable set, I get exactly the error shown. Also, LD_LIBRARY_PATH is correctly set when configure is called, and when gmake is called. So how can I adjust the build process? I suspect there is an option I can pass to configure, but I haven't been able to figure out what it would be. For example, I tried adding LD_LIBRARY_PATH=$LD_LIBRARY_PATH to the configure command line to no avail. Any help would be greatly appreciated! -Andy This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From silvas at purdue.edu Mon Jul 1 16:42:02 2013 From: silvas at purdue.edu (Sean Silva) Date: Mon, 1 Jul 2013 16:42:02 -0700 Subject: [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: On Mon, Jul 1, 2013 at 3:55 PM, David Blaikie wrote: > I'm just curious whether there's some internal > logic/consistency that we've not discussed. > Not as far as I can tell. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Mon Jul 1 16:44:24 2013 From: grosbach at apple.com (Jim Grosbach) Date: Mon, 01 Jul 2013 16:44:24 -0700 Subject: [LLVMdev] Auxiliary operand types for disassembler. In-Reply-To: <51CAEF21.5060301@codeaurora.org> References: <51C9B637.2010903@codeaurora.org> <8378A651-BA56-4790-9FD8-8D21D8C48D9E@apple.com> <51CAEF21.5060301@codeaurora.org> Message-ID: <8A3DCD42-EF0D-4250-AC57-F7BE27964806@apple.com> On Jun 26, 2013, at 6:39 AM, Sid Manning wrote: > On 06/25/2013 04:46 PM, Jim Grosbach wrote: >> Hi Sid, >> >> This feels like it’s exposing too much of the disassembler internals >> into the MCOperand representation. I’m not sure I follow why that’s >> necessary. Can you elaborate a bit? >> > A packet contains 1-4 insns and until the contents of the entire packet are known the meaning of any individual insn is not known with 100% certainty. Adding the auxiliary operand was a way for the printer to accumulate this information as the packet is being read. > > An alternative is to pass the raw insn to the target printer. This would have the same effect, giving the printer the chance to accumulate and interpret the insns when printing the contents of the packet. > > Here are some examples: > - Some insns contain a 3-bit new value, the new value bits, Nv[2:1] are set to 1, 2, or 3 if the producer is 1, 2, or 3 insns ahead of the consumer. Nv[0] is 1 if the producer is an odd register, 0 for even. > { > r17 = add(r2, r17) > r23 = add(r23, #-1) > if (!cmp.eq(r23.new, #0)) jump:t foobar > } > The above packet has 2 producers, r17 and r23. If the compare and jump is encoded as: 0x2443e000 where new value bits are stored in [18:16] and equal 0x3 then register 23 would be used - Nv[2:1] == 0x1. The producer was 1 insn back and the register is odd. If bits [18:16] had been 0x5 then register 17 would have been used. > > - Parse bits can be used to designate the end of a hardware loop. If the parsebits are set to 10b in the first insn of the packet then this packet is the end of hardware loop 0, if the parse bits in insn 1 are set to 01b and the parse bits in insn 2 are set to 10b then this is the last packet in hardware loop 1. If the parse bits in insn 1 and insn 2 are both set to 10b then this is then end in both hardware loops 0 and 1. At the tail of the packet the disassembler would add the following: > }:endloop0 > }:endloop1 > }:endloop0:endloop1 > to represent the end of the various loops. > > The disassembler has to accumulate the MCInsts for the whole packet and either the raw hex encodings or append the needed info as an operand stored in the MCInst. The reason I tried using MCOperand is that it kept me from having to change objdump itself. > The representation of a generic MCInst should not need to change. While not exactly what you’re dealing with, the ARM disassembler does have some multi-instruction context it has to maintain to get predicated instructions for Thumb2 correct. In your case, the disassembler will likely need to consume a packet at a time so it has the whole context. That is, the printer and encoder shouldn’t have to know any of the context. The disassembler/codegen does all of it. -Jim > Thanks, > >> -Jim >> >> On Jun 25, 2013, at 8:24 AM, Sid Manning > > wrote: >> >>> >>> I'm working on a disassembler for hexagon (vliw) architecture and I >>> would like to add an additional operand type, "kAux" to the MCOperand >>> class. >>> >>> The reason for this is that each insn has parse bits which are not >>> explicit operands and have differing meanings based on the insn's >>> location within the packet and the number of insns inside the packet. >>> In order for the disassembler to correctly represent the insn it needs >>> to accumulate the series of insns that form the packet. Only when the >>> entire packet is known can the meaning of the parse bits be properly >>> interpreted. >>> >>> Changing objdump's interface to printInst so it passes the raw insn >>> bits down would allow the printer to accumulate the same information >>> and would work just as well I think. >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>> hosted by The Linux Foundation >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> http://llvm.cs.uiuc.edu >>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.lacey at apple.com Mon Jul 1 16:52:56 2013 From: mark.lacey at apple.com (Mark Lacey) Date: Mon, 01 Jul 2013 16:52:56 -0700 Subject: [LLVMdev] Problem with building llvm and running project In-Reply-To: References: Message-ID: Hi Eirini, If you provide more information, such as the OS and compiler you are using to build, as well as the specific header files that the build is unable to find, it will make it easier for people on the list to help you. If you are building on Windows with VS, you will want to take a look at: http://llvm.org/docs/GettingStartedVS.html. Mark On Jul 1, 2013, at 2:55 AM, Eirini _ wrote: > Hello, > i am new to LLVM and i want to create my own project with a cpp file which calls > llvm functions and then run it. I download clang source, llvm source and compiler-rt source. > I tried to configure and build llvm using this http://llvm.org/docs/GettingStarted.html#getting-started-with-llvm but it failed because .h files included on top of the Hello.cpp couldn't found. Can you tell me the instuctions about what i have to do in order to build llvm? do i need to create a new build folder? What do i have to set to Path variable? > After building llvm i would like to run my own project, let's say Hello.cpp found in the /lib/Transforms folder on the llvm source. > > Thanks, > Eirini > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Mon Jul 1 17:48:29 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 01 Jul 2013 17:48:29 -0700 Subject: [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? In-Reply-To: References: <817842A1-B12A-4AD1-A439-7A83F9C3F62B@apple.com> Message-ID: <36AA7E49-7AF9-4454-881B-C6BA73FC53B1@apple.com> On Jul 1, 2013, at 1:45 PM, Eli Friedman wrote: > On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet wrote: > On Jul 1, 2013, at 11:52 AM, Eli Friedman wrote: > >> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet wrote: >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. >> >> This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). >> >> I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). > > What did I miss? > > > Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. > > You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.) > Thanks again Eli for the direction. That works great! I have sent a proposal with that solution. -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Mon Jul 1 19:04:43 2013 From: tanmx_star at yeah.net (Star Tan) Date: Tue, 2 Jul 2013 10:04:43 +0800 (CST) Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <51D1A47A.1000801@grosser.es> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> <51D1A47A.1000801@grosser.es> Message-ID: <6704c81e.64b5.13f9d1fef31.Coremail.tanmx_star@yeah.net> At 2013-07-01 23:47:06,"Tobias Grosser" wrote: >On 07/01/2013 06:51 AM, Star Tan wrote: >>> Great. Now we have two test cases we can work with. Can you >> >>> upload the LLVM-IR produced by clang -O0 (without Polly)? >> Since tramp3d-v4.ll is to large (19M with 267 thousand lines), I would focus on the oggenc benchmark at firat. >> I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which compressed into the file oggenc.tgz. > >Sounds good. > >>> 2) Check why the Polly scop detection is failing >>> >>> You can use 'opt -polly-detect -analyze' to see the most common reasons >>> the scop detection failed. We should verify that we perform the most >>> common and cheap tests early. >>> >> I also attached the output file oggenc_polly_detect_analyze.log produced by "polly-opt -O3 -polly-detect -analyze oggenc.ll". Unfortunately, it only dumps valid scop regions. At first, I thought to dump all debugging information by "-debug" option, but it will dump too many unrelated information produced by other passes. Do you know any option that allows me to dump debugging information for the "-polly-detect" pass, but at the same time disabling debugging information for other passes? > >I really propose to not attach such large files. ;-) > >To dump debug info of just one pass you can use >-debug-only=polly-detect. However, for performance measurements, you >want to use >a release build to get accurate numbers. > >Another flag that is interesting is the flag '-stats'. It gives me the >following information: > > 4 polly-detect > - Number of bad regions for Scop: CFG too complex > 183 polly-detect > - Number of bad regions for Scop: Expression not affine > 103 polly-detect > - Number of bad regions for Scop: Found base address > alias > 167 polly-detect > - Number of bad regions for Scop: Found invalid region > entering edges > 59 polly-detect > - Number of bad regions for Scop: Function call with > side effects appeared > 725 polly-detect > - Number of bad regions for Scop: Loop bounds can not > be computed > 93 polly-detect > - Number of bad regions for Scop: Non canonical > induction variable in loop > 8 polly-detect > - Number of bad regions for Scop: Others > 53 polly-detect > - Number of regions that a valid part of Scop > >This seems to suggest that we most scops fail due to loop bounds that >can not be computed. It would be interesting to see what kind of >expressions these are. In case SCEV often does not deliver a result, >this may be one of the cases where bottom up scop detection would help >a lot, as outer regions are automatically invalidated if we can not get >a SCEV for the loop bounds of the inner regions. Thank you so much. This is what I need. I just want to know why these scops are invalid! > >However, I still have the feeling the test case is too large. You can >reduce it I propose to first run opt with 'opt -O3 -polly >-disable-inlining -time-passes'. You then replace all function >definitions with >s/define internal/define/. After this preprocessing you can use a regexp >such as "'<,'>s/define \([^{}]* \){\_[^{}]*}/declare \1" to replace >function definitions with their declaration. You can use this to binary >search for functions that have a large overhead in ScopDetect time. > >I tried this a little, but realized that no matter if I removed the >first or the second part of a module, the relative scop-detect time >always went down. This is surprising. If you see similar effects, it >would be interesting to investigate. No problem. I will try to reduce code size. Bests, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Mon Jul 1 19:09:22 2013 From: tanmx_star at yeah.net (Star Tan) Date: Tue, 2 Jul 2013 10:09:22 +0800 (CST) Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: <51D1A5DC.2000607@grosser.es> References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51D1A5DC.2000607@grosser.es> Message-ID: <300c688.654b.13f9d2431ab.Coremail.tanmx_star@yeah.net> At 2013-07-01 23:53:00,"Tobias Grosser" wrote: >On 06/23/2013 11:12 PM, Star Tan wrote: >> Hi all, >> >> >> When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not? >> >> >> I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED. However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT. >> >> >> Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr? > >Hi Star Tan, > >I just attached you some hacks I tried on the week-end. The attached >patch prints the confidence intervals in LNT. If you like you can take >them as an inspiration (not directly copy) to print those values in your >lnt server. (The patches require scipy and numpy being installed in your >python sandbox. This should be OK for our experiments, but we probably >do not want to reimplement those functions before upstreaming). Wonderful. I will integrate them into our lnt server. > >Also, as Anton suggested. It may make sense to rerun your experiments >with a larger number of samples. As the machine is currently not loaded >and we do not track individual commits, 10 samples should probably be >good enough. OK, I can rerun all tests with 10 samples tonight-:). > >Cheers, >Tobias Bests, Star Tan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Mon Jul 1 06:51:47 2013 From: tanmx_star at yeah.net (Star Tan) Date: Mon, 1 Jul 2013 21:51:47 +0800 (CST) Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <51CF7D1A.8020200@grosser.es> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> Message-ID: <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> >Great. Now we have two test cases we can work with. Can you >upload the LLVM-IR produced by clang -O0 (without Polly)? Since tramp3d-v4.ll is to large (19M with 267 thousand lines), I would focus on the oggenc benchmark at firat. I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which compressed into the file oggenc.tgz. >2) Check why the Polly scop detection is failing > >You can use 'opt -polly-detect -analyze' to see the most common reasons >the scop detection failed. We should verify that we perform the most >common and cheap tests early. > I also attached the output file oggenc_polly_detect_analyze.log produced by "polly-opt -O3 -polly-detect -analyze oggenc.ll". Unfortunately, it only dumps valid scop regions. At first, I thought to dump all debugging information by "-debug" option, but it will dump too many unrelated information produced by other passes. Do you know any option that allows me to dump debugging information for the "-polly-detect" pass, but at the same time disabling debugging information for other passes? Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: oggenc.tgz Type: application/octet-stream Size: 657372 bytes Desc: not available URL: From pashev.igor at gmail.com Mon Jul 1 14:31:38 2013 From: pashev.igor at gmail.com (=?UTF-8?B?0JjQs9C+0YDRjCDQn9Cw0YjQtdCy?=) Date: Tue, 2 Jul 2013 01:31:38 +0400 Subject: [LLVMdev] [cfe-dev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line? In-Reply-To: References: Message-ID: 2013/7/2 Sean Silva : > always put `template <...>` on its own line. +1 From eda-qa at disemia.com Mon Jul 1 20:14:54 2013 From: eda-qa at disemia.com (edA-qa mort-ora-y) Date: Tue, 02 Jul 2013 05:14:54 +0200 Subject: [LLVMdev] intended use/structure of AllocA/local variables Message-ID: <51D245AE.8040408@disemia.com> I'm trying to determine what is the best approach to using AllocA. Currently I just allocate all my variables at the beginning of the function but wondering if they should rather be done closer to the scope they are used. This would also require a way to do a free on the allocated structure, but there doesn't appear to be such a function. Is it the intent that all stack variables are alloc'd at the beginning of the function? I guess this has the advantage that LLVM then knows the full size of the function's stack at entry and can just use fixed offsets for all variables. It has the disadvantage that all potentially used stack variables are always allocated, which uses more stack memory than is actually required. What's the intended practice with AllocA? Can the optimizer determine when the use of two AllocA spaces is exclusive and share the stack space? -- edA-qa mort-ora-y -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Sign: Please digitally sign your emails. Encrypt: I'm also happy to receive encrypted mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From michyang_ysq at hotmail.com Mon Jul 1 21:21:56 2013 From: michyang_ysq at hotmail.com (=?gb2312?B?0e7KpMew?=) Date: Tue, 2 Jul 2013 04:21:56 +0000 Subject: [LLVMdev] How to build up static call graph for Android native code Message-ID: Hi everyone, Can LLVM be used to construct the static call graph for native code of Android platform? After reading the manual, I still don't quite get the solution. Thanks in advance for your help. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Mon Jul 1 22:11:46 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 01 Jul 2013 22:11:46 -0700 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> Message-ID: <51D26112.30507@grosser.es> On 07/01/2013 09:41 AM, Renato Golin wrote: > On 1 July 2013 02:02, Chris Matthews wrote: > >> One thing that LNT is doing to help “smooth” the results for you is by >> presenting the min of the data at a particular revision, which (hopefully) >> is approximating the actual runtime without noise. >> > > That's an interesting idea, as you said, if you run multiple times on every > revision. > > On ARM, every run takes *at least* 1h, other architectures might be a lot > worse. It'd be very important on those architectures if you could extract > point information from group data, and min doesn't fit in that model. You > could take min from a group of runs, but again, that's no different than > moving averages. Though, "moving mins" might make more sense than "moving > averages" for the reasons you exposed. I get your point. On the other side it may be worth getting first statistically reliable and noise free numbers with a lower resolution in terms of commits. Given those reliable numbers, we can then work on improving the resolution (without introducing noice). Also, multiple runs per revision should be easy to parallelize on different machines, such that confidence in the results seems to be a problem that can be solved by additional hardware. > Also, on tests that take as long as noise to run (0.010s or less on A15), > the minimum is not relevant, since runtime will flatten everything under > 0.010 onto 0.010, making your test always report 0.010, even when there are > regressions. > > I really cannot see how you can statistically enhance data in a scenario > where the measuring rod is larger than the signal. We need to change the > wannabe-benchmarks to behave like proper benchmarks, and move everything > else into "Applications" for correctness and specifically NOT time them. > Less is more. It is out of question that we can not improve the existing data, but it would be great to at least reliably detect that some data is just plain noise. > That works well with a lot of samples per revision, but not for across >> revisions, where we really need the smoothing. One way to explore this is >> to turn >> > > I was really looking forward to that hear the end of that sentence... ;) > > > > We also lack any way to coordinate or annotate regressions, that is a whole >> separate problem though. >> > > Yup. I'm having visions of tag clouds, bugzilla integration, cross > architectural regression detection, etc. But I'll ignore that for now, > let's solve one big problem at a time. ;) Yes, there is a lot of stuff that would really help. Tobi From baldrick at free.fr Tue Jul 2 00:46:22 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 02 Jul 2013 09:46:22 +0200 Subject: [LLVMdev] intended use/structure of AllocA/local variables In-Reply-To: <51D245AE.8040408@disemia.com> References: <51D245AE.8040408@disemia.com> Message-ID: <51D2854E.5070305@free.fr> Hi, On 02/07/13 05:14, edA-qa mort-ora-y wrote: > I'm trying to determine what is the best approach to using AllocA. > Currently I just allocate all my variables at the beginning of the > function but wondering if they should rather be done closer to the scope > they are used. This would also require a way to do a free on the > allocated structure, but there doesn't appear to be such a function. just allocate them at the start and use lifetime intrinsics to tell LLVM when they are actually in use. > > Is it the intent that all stack variables are alloc'd at the beginning > of the function? I guess this has the advantage that LLVM then knows the > full size of the function's stack at entry and can just use fixed > offsets for all variables. It has the disadvantage that all potentially > used stack variables are always allocated, which uses more stack memory > than is actually required. > > What's the intended practice with AllocA? > > Can the optimizer determine when the use of two AllocA spaces is > exclusive and share the stack space? The code generators will reuse stack space if it can. You can help it out using the lifetime intrinsics mentioned above. Ciao, Duncan. From xtxwy.ustc at gmail.com Tue Jul 2 00:49:49 2013 From: xtxwy.ustc at gmail.com (maxs) Date: Tue, 02 Jul 2013 15:49:49 +0800 Subject: [LLVMdev] [Loop Vectorize] Question on -O3 In-Reply-To: References: Message-ID: <51D2861D.8000102@gmail.com> Hi, When I use "-loop-vectorize" to vectorize a loop as below: //==================================== void bar(float *A, float* B, float K, int start, int end) { for (int i = start; i < end; ++i) A[i] *= B[i] + K; } //==================================== First, I use "*clang -O0 -emit-llvm -S bar.c -o bar.l*" to emit the .l file. Then I use "*opt -loop-vectorize -S bar.l -o bar.v.l*". Unfortunately, the vectorization don't work. But I use "*opt -O3 -loop-vectorize -S bar.l -o bar.v.l*" and it works. My question is: What the information needed by "*-loop-vectorize*" the "*-O3*" provides? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jobnoorman at gmail.com Tue Jul 2 02:34:23 2013 From: jobnoorman at gmail.com (Job Noorman) Date: Tue, 02 Jul 2013 11:34:23 +0200 Subject: [LLVMdev] Problem selecting the correct registers for a calling convention In-Reply-To: References: <3093493.QosMefQUkD@squat> Message-ID: <4640059.eBTrZfbOFX@squat> Hi Tim, I finally found some time to work on this issue again and I'm currently trying your suggestion: implementing the calling convention in C++. I have the feeling that isSplit is not enough to deal with my problem since there seems to be no way to tell if later arguments belong to the split or not. For example, a call to a function f(i32, i16, i16) will look exactly the same as one to g(i64): > Arg: 0, isSplit: 1 > Arg: 1, isSplit: 0 > Arg: 2, isSplit: 0 > Arg: 3, isSplit: 0 The information I need seems to be available in the InputArg class (OrigArgIndex and PartOffset) but unfortunately, only its Flags member is passed to the calling convention implementation. Am I missing something here? Regards, Job On Tuesday 27 November 2012 15:50:38 Tim Northover wrote: > Hi Job, > > > This issue is basically that I cannot find a way to distinguish two i16 > > arguments from one i32. Is there a way to do this in LLVM? Preferably > > using > > tablegen, of course:-) > > I think the property you want is "isSplit" (or, from the TableGen side > CCIfSplit). > > This gets applied to the first of those i16s that are produced. > Unfortunately I can't think of much you can do from TableGen to swap > the registers around (CCIfSplit is useful if the i32 would have to > start at an even-numbered register, for example). But in the C++ > ISelLowering code you can use that flag to deal with the two registers > together in a sane way. > > Tim. From baldrick at free.fr Tue Jul 2 04:00:57 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 02 Jul 2013 13:00:57 +0200 Subject: [LLVMdev] [Loop Vectorize] Question on -O3 In-Reply-To: <51D2861D.8000102@gmail.com> References: <51D2861D.8000102@gmail.com> Message-ID: <51D2B2E9.4060002@free.fr> Hi maxs, On 02/07/13 09:49, maxs wrote: > Hi, > When I use "-loop-vectorize" to vectorize a loop as below: > //==================================== > void bar(float *A, float* B, float K, int start, int end) { > for (int i = start; i < end; ++i) > A[i] *= B[i] + K; > } > //==================================== > First, I use "*clang -O0 -emit-llvm -S bar.c -o bar.l*" to emit the .l file. > Then I use "*opt -loop-vectorize -S bar.l -o bar.v.l*". Unfortunately, the > vectorization don't work. But I use "*opt -O3 -loop-vectorize -S bar.l -o > bar.v.l*" and it works. > My question is: What the information needed by "*-loop-vectorize*" the > "*-O3*" provides? Thanks. all of the advanced LLVM optimizations assume that the IR has already been cleaned up already by the less advanced optimizers. Try running something like -sroa -instcombine -simplifycfg first. Ciao, Duncan. From renato.golin at linaro.org Tue Jul 2 04:12:32 2013 From: renato.golin at linaro.org (Renato Golin) Date: Tue, 2 Jul 2013 12:12:32 +0100 Subject: [LLVMdev] [Loop Vectorize] Question on -O3 In-Reply-To: <51D2B2E9.4060002@free.fr> References: <51D2861D.8000102@gmail.com> <51D2B2E9.4060002@free.fr> Message-ID: On 2 July 2013 12:00, Duncan Sands wrote: > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. > There could be a warning on more advanced optimizations if the pre-requisites haven't run (as per individual call). And maybe a way to force the dependencies to run regardless of the On level. Not sure how this dependency system would be constructed, though. --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Tue Jul 2 04:26:48 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 02 Jul 2013 13:26:48 +0200 Subject: [LLVMdev] [Loop Vectorize] Question on -O3 In-Reply-To: References: <51D2861D.8000102@gmail.com> <51D2B2E9.4060002@free.fr> Message-ID: <51D2B8F8.80902@free.fr> Hi Renato, On 02/07/13 13:12, Renato Golin wrote: > On 2 July 2013 12:00, Duncan Sands > > wrote: > > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. > > > There could be a warning on more advanced optimizations if the pre-requisites > haven't run (as per individual call). And maybe a way to force the dependencies > to run regardless of the On level. Not sure how this dependency system would be > constructed, though. they aren't dependencies in the sense that they aren't needed for correct functioning. They are only needed to get decent results. But then you get into a minefield, since all kinds of optimizations can expose stuff that causes other optimizers to do stuff that causes other optimizers to do stuff that (... repeat many times) that ends up allowing the loop vectorizer to do more. Anyway, since "opt" is a developer tool I think it is reasonable to require people to understand stuff rather than trying to have it all happen automagically (such an automagic system wouldn't be useful for clang and other frontends anyway, so in a sense would just represent pointless complexity). Ciao, Duncan. From renato.golin at linaro.org Tue Jul 2 04:47:40 2013 From: renato.golin at linaro.org (Renato Golin) Date: Tue, 2 Jul 2013 12:47:40 +0100 Subject: [LLVMdev] [Loop Vectorize] Question on -O3 In-Reply-To: <51D2B8F8.80902@free.fr> References: <51D2861D.8000102@gmail.com> <51D2B2E9.4060002@free.fr> <51D2B8F8.80902@free.fr> Message-ID: On 2 July 2013 12:26, Duncan Sands wrote: > Anyway, since "opt" is a developer tool I think it is reasonable to > require people to understand stuff rather than trying to have it all happen > automagically (such an automagic system wouldn't be useful for clang and > other > frontends anyway, so in a sense would just represent pointless complexity). > Yes. Opt should be bare, but Clang/llc could have some facility in that area. I agree they're not dependencies, but they might give the wrong impression when the "nice-to-have" passes didn't run... cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From borja.ferav at gmail.com Tue Jul 2 05:15:18 2013 From: borja.ferav at gmail.com (Borja Ferrer) Date: Tue, 2 Jul 2013 14:15:18 +0200 Subject: [LLVMdev] Problem selecting the correct registers for a calling convention Message-ID: Hello Job, I managed to resolve this same problem by using custom C++ code since as you mentioned the isSplit flag doesn't help here. There are 2 ways to analyze the arguments of a function: 1) You can get a Function pointer in LowerFormalArguments, and in LowerCall only when Callee can by dyn_casted to a GlobalAddressSDNode. By having this pointer you can then do: for (Function::const_arg_iterator I = F->arg_begin(), E = F->arg_end();I != E; ++I) { unsigned Bytes = TD->getTypeSizeInBits(I->getType()) / 8; // do stuff here } 2) The second case is when the dyn_cast above fails because the Callee SDValue is a ExternalSymbolSDNode. In this case you have to manually analyze the arguments using PartOffset. -------------- next part -------------- An HTML attachment was scrubbed... URL: From spop at codeaurora.org Tue Jul 2 08:18:17 2013 From: spop at codeaurora.org (Sebastian Pop) Date: Tue, 2 Jul 2013 10:18:17 -0500 Subject: [LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass In-Reply-To: <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> References: <4033006f.2d60.13f92656fcc.Coremail.tanmx_star@yeah.net> <51CF7D1A.8020200@grosser.es> <4b8db8c3.4ccb.13f9a80ea9e.Coremail.tanmx_star@yeah.net> Message-ID: <20130702151817.GA5885@codeaurora.org> Star Tan wrote: > I attached the oggenc.ll (LLVM-IR produced by clang -O0 without Polly), which compressed into the file oggenc.tgz. Let me repeat what Tobi said: please do *not* send out large files to the mailing lists. Thanks, Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From jeffbush001 at gmail.com Tue Jul 2 09:12:10 2013 From: jeffbush001 at gmail.com (Jeff Bush) Date: Tue, 2 Jul 2013 09:12:10 -0700 Subject: [LLVMdev] Convert the result of a vector comparison into a scalar bit mask? In-Reply-To: References: Message-ID: On Sun, Jun 30, 2013 at 11:14 PM, Jeff Bush wrote: > When LLVM does a comparison of two vectors, in this case with 16 > elements, the returned type of setcc is v16i1. The architecture I'm > targeting allows storing the result of a vector comparison as a bit > mask in a scalar register, but I'm having trouble converting the > result of setcc into a value that is usable there. For example, if I > try to AND together masks that are the results of two comparisons, it > can't select an instruction because the operand types are v16i1 and no > instructions can deal with that. I don't want to have to modify every > instruction to be aware of v16i1 as a data type (which doesn't seem > right anyway). Ideally, I could just tell the backend to treat the > result of a vector setcc as an i32. I've tried a number of things, > including: > > - Using setOperationAction for SETCC to Promote and set the Promote > type to i32. It asserts internally because it tries to do a sext > operation on the result, which is incompatible. > > - Using a custom lowering action to wrap the setcc in a combination of > BITCAST/ZERO_EXTEND nodes (which I could match and eliminate in the > instruction pattern). However those DAG nodes get removed during one > of the passes and I the result type is still v16i1 After some thought, I realize that the second approach doesn't work because the operation would be applied to each element in the vector (thus the result is still a vector). There doesn't appear a promotion type that will pack a vector. I tried adding a lowering that will transform SETCC into into a custom node that returns a scalar: SDValue VectorProcTargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(SPISD::VECTOR_COMPARE, Op.getDebugLoc(), MVT::i32, Op.getOperand(0), Op.getOperand(1), Op.getOperand(2)); } def veccmp : SDNode<"SPISD::VECTOR_COMPARE", SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVec<1>]>>; And changing the pattern that matches vector comparisions to: [(set i32:$dst, (veccmp v16i32:$a, v16i32:$b, condition))] Unfortunately, this ends up tripping an assert: Assertion failed: (Op.getValueType().getScalarType().getSizeInBits() == BitWidth && "Mask size mismatches value type size!"), function SimplifyDemandedBits, file llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp, line 357. (another variant of this would be to keep the setcc and wrap it with a custom node 'PACK_VECTOR' that takes v16i1 as a param and returns i32 as a result). I'm not sure if I'm on the right track with this approach or not. Since I'm exposing this with a built-in in clang anyway (since there is no other way to do in C this that I know of), I could just punt entirely and use an intrinsic to expose packed vector comparisons. But that doesn't seem like the right thing to do. From shuxin.llvm at gmail.com Tue Jul 2 12:25:45 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 02 Jul 2013 12:25:45 -0700 Subject: [LLVMdev] SCEV update problem Message-ID: <51D32939.3050505@gmail.com> Hi, We come across a ScalarEvolution (SE) problem the other day. It seems to be a fundamental design problem. I don't think I have a clean and cheap fix to this problem. I talked with Andy in the phone yesterday, he told me it is a known fundamental problem. But I don't see any discussion on this problem on the list, so I post the problem here, soliciting your insightful comment. Tons thanks in advance! Shuxin I don't know for sure if I can post the reduced *.ll (2k+ lines) to the list. Let me try my best to describe this problem without *.ll. Let me start from 5k feet high. ================================== The relevant aspect of SCEV regarding to the problem are: a1) SCEV is *NOT* self-contained. At least SCEVUnknown points to an llvm::Instruction. a2) SCEV must be in sync with IR. One way to sync them is to delete invalidate SCEV, and force SE to recompute based on most recent IR. SCEVUnknonw is invalidated if its corresponding llvm::Instruction is deleted. However, it dose not invalidate those SCEVs that depends on it as well. a3) SCEV is hierarchically structured. So, if a SCEVUnknown is invalidated, the compiler needs to figure out who depends on this invalidated SCEVUnknown, and invalidate them accordingly. To enforce the consistency between SCEV and IR, currently compiler: ec1) Reset SCEVUnknown::ThePtr via callback function if the Instruciton corresponding to the SCEVUnknown in question. ec2) it is up to the optimizer, which change the IR, update the SCEV at once. However, there are flaws: ec1.f): for ec1), it only invalidate an SCEVUnknown. The SCEVs that depends on this SCEVUnknown are not autotmatically invalidated. ec2.f1) for ec2), sometimes optimizer are difficult to (efficiently) figure out which SCEVs need to be invalidated. ec2.f2) If the transformation take place in common utility functions, the optimizer has no control at all. If we change the utility functions to be fully SCEV-aware, we might as well promote SE as essential part of IR (Yuck!) ec1.f is hard to be fixed with negligible cost, and ec2.f1 and ec2.f2 render it hard to provide even a nasty hack. Okay, let descend from 5k' down to the ground:-) ================================================ Let us consider this snippet ---------------------- E1 = ... E2 = expression of E1 // sext/zext/trunc or other simple expression of E1. loop1(...) { r1 = phi1(..., E2) = r1 // use 1 } loop2(...) { r2 = phi2(... r1) = r2 // use 2; --------------------- o. At beginning, both SCEVs of use1 and use-2 are in the form of SCEV(use1) = ... E2 ... SCEV(use2) = ... E2 ... SE dose not dig into E2 because the relationship between E1 and E2 is obscured by some instructions. o. loop2 is bit dull, and loop1 is more interesting. So, the SCEVs corresponding to the expression defined in loop1 are invalidated/updated frequently. On the other hand, as optimization progress, the compiler now realize that the SCEV(r1) can be represented using E1. Now, we have SCEV(use1) = ... E1 ... SCEV(use2) = ... E2 ... o. LSR redefines phi1, and replace all r1 occurrences, making phi1 and E2 dead; both are deleted. However, SCEV(use2) is still "... E2 ...". The only different is that the SCEVUnknown corresponding to E2 has its pointer invalidated. o. Following loop-pass query SCEV(use2), and try to deference the E2, and get segmentation fault. From rmann at latencyzero.com Tue Jul 2 12:33:54 2013 From: rmann at latencyzero.com (Rick Mann) Date: Tue, 2 Jul 2013 12:33:54 -0700 Subject: [LLVMdev] clang static analyzer annotations Message-ID: Not sure if this is the right place to ask. Please let me know if there's a better place. I ran the clang static analyzer via Xcode 4.6.3 on our project that uses a lot of third-party libraries. One of them is Google protobufs, and it has a set of non-exiting assertion macros that the analyzer (rightly) ignores when flagging some NULL dereferences. The macros look like this: #define GOOGLE_LOG(LEVEL) \ ::google::protobuf::internal::LogFinisher() = \ ::google::protobuf::internal::LogMessage( \ ::google::protobuf::LOGLEVEL_##LEVEL, __FILE__, __LINE__) #define GOOGLE_LOG_IF(LEVEL, CONDITION) \ !(CONDITION) ? (void)0 : GOOGLE_LOG(LEVEL) #define GOOGLE_CHECK(EXPRESSION) \ GOOGLE_LOG_IF(FATAL, !(EXPRESSION)) << "CHECK failed: " #EXPRESSION ": " And the code might look like this: protobuf_AddDesc_manifest_2eproto(); const ::google::protobuf::FileDescriptor* file = ::google::protobuf::DescriptorPool::generated_pool()->FindFileByName( "manifest.proto"); GOOGLE_CHECK(file != NULL); Affine3f_descriptor_ = file->message_type(0); Resulting in a warning from the analyzer on the last line that file could be NULL. Is there a way to annotate these macros so that the analyzer doesn't flag the code? I realize that's not strictly correct, since the macros don't exit, but, for better or for worse, the code is checked, and the numerous warnings potentially hide other, more pernicious ones. Thanks! -- Rick From westdac at gmail.com Tue Jul 2 14:35:01 2013 From: westdac at gmail.com (Dan) Date: Tue, 2 Jul 2013 15:35:01 -0600 Subject: [LLVMdev] Encountering flt_rounds_ in llvm3.3 for newlib compilation Message-ID: I made the switch to llvm3.3, and encountered a flt_rounds I'm using a soft float architecture and hopefully people have some ideas on how to help: I received: i32 = flt_rounds "Do not know how to promote this operator!" I currently do not have any custom setting for the FLT_ROUNDS_ I'd like to just replace the FLT_ROUNDS_ with a "1" value. Any thoughts on how to do this or mimic this from other places? I see that other targets have hardware support for getting the actual hardware setting of rounding mode, but like I said, I'm using a soft float architecture and this didn't come up during 3.2 compilation. thanks, Dan From ghassan_shobaki at yahoo.com Tue Jul 2 14:35:38 2013 From: ghassan_shobaki at yahoo.com (Ghassan Shobaki) Date: Tue, 2 Jul 2013 14:35:38 -0700 (PDT) Subject: [LLVMdev] MI Scheduler vs SD Scheduler? In-Reply-To: References: <1372448337.69444.YahooMailNeo@web125505.mail.ne1.yahoo.com> Message-ID: <1372800938.86583.YahooMailNeo@web125506.mail.ne1.yahoo.com> Thank you for the answers! We are currently trying to test the MI scheduler. We are using LLVM 3.3 with Dragon Egg 3.3 on an x86-64 machine. So far, we have run one SPEC CPU2006 test with the MI scheduler enabled using the option -fplugin-arg-dragonegg-llvm-option='-enable-misched:true' with -O3. This enables the machine scheduler in addition to the SD scheduler. We have verified this by adding print messages to the source code of both schedulers. In terms of correctness, enabling the MI scheduler did not cause any failure. However, in terms of performance, we have seen a mix of small positive and negative differences with the geometric mean difference being near zero. The maximum improvement that we have seen is 3% on the Gromacs benchmark.  Is this consistent with your test results? We have then tried to run a test in which the MI scheduler is enabled but the SD scheduler is disabled (or neutralized) by adding the option: -fplugin-arg-dragonegg-llvm-option='-pre-RA-sched:source' to the flags that we have used in the first test. However, this did not work; we got the following error message: GCC_4.6.4_DIR/install/bin/gcc -c -o lbm.o -DSPEC_CPU -DNDEBUG    -O3 -march=core2 -mtune=core2 -fplugin='DRAGON_EGG_DIR/dragonegg.so' -fplugin-arg-dragonegg-llvm-option='-enable-misched:true' -fplugin-arg-dragonegg-llvm-option='-pre-RA-sched:source'       -DSPEC_CPU_LP64         lbm.c cc1: for the -pre-RA-sched option: may only occur zero or one times! specmake: *** [lbm.o] Error 1 What does this message mean? Is this a bug or we are doing something wrong? How can we test the MI scheduler by itself? Is it interesting to test 3.3 or there are interesting features that were added to the trunk after branching 3.3? In the latter case, we are willing to test the trunk. Thanks Ghassan Shobaki Assistant Professor Department of Computer Science Princess Sumaya University for Technology Amman, Jordan ________________________________ From: Andrew Trick To: Ghassan Shobaki Cc: "llvmdev at cs.uiuc.edu" Sent: Monday, July 1, 2013 8:10 PM Subject: Re: MI Scheduler vs SD Scheduler? Sent from my iPhone On Jun 28, 2013, at 2:38 PM, Ghassan Shobaki wrote: Hi, > > >We are currently in the process of upgrading from LLVM 2.9 to LLVM 3.3. We are working on instruction scheduling (mainly for register pressure reduction). I have been following the llvmdev mailing list and have learned that a machine instruction (MI) scheduler has been implemented to replace (or work with?) the selection DAG (SD) scheduler. However, I could not find any document that describes the new MI scheduler and how it differs from and relates to the SD scheduler. MI is now the place to implement any heuristics for profitable scheduling. SD scheduler will be directly replaced by a new pass that orders the DAG as close as it can to IR order. We currently emulate this with -pre-RA-sched=source. The only thing necessarily different about MI sched is that it runs after reg coalescing and before reg alloc, and maintains live interval analysis. As a result, register pressure tracking is more accurate. It also uses a new target interface for precise register pressure.  MI sched  is intended to be a convenient place to implement target specific scheduling. There is a generic implementation that uses standard heuristics to reduce register pressure and balance latency and CPU resources. That is what you currently get when you enable MI sched for x86.  The generic heuristics are implemented as a priority function that makes a greedy choice over the ready instructions based on the current pressure and the resources and latency of the scheduled and unscheduled set of instructions. An DAG subtree analysis also exists   (ScheduleDFS), which can be used for register pressure avoidance. This isn't hooked up to the generic heuristics yet for lack of interesting test cases. So, I would appreciate any pointer to a document (or a blog) that may help us understand the difference and the relation between the two schedulers and figure out how to deal with them. We are trying to answer the following questions: > > >- A comment at the top of the file ScheduleDAGInstrs says that this file implements re-scheduling of machine instructions. So, what does re-scheduling mean? Rescheduling just means optional scheduling. That's really what the comment should say. It's important to know that MI sched can be skipped for faster compilation.  Does it mean that the real scheduling algorithms (such as reg pressure reduction) are currently implemented in the SD scheduler, while the MI scheduler does some kind of complementary work (fine tuning) at a lower level representation of the code? >And what's the future plan? Is it to move the real scheduling algorithms into the MI scheduler and get rid of the SD scheduler? Will that happen in 3.4 or later? > I would like to get rid of the SD scheduler so we can reduce compile time by streamline the scheduling data structures and interfaces. There may be some objection to doing that in 3.4 if projects haven't been able to migrate. It will be deprecated though.  > >- Based on our initial investigation of the default behavior at -O3 on x86-64, it appears that the SD scheduler is called while the MI scheduler is not. That's consistent with the above interpretation of re-scheduling, but I'd appreciate any advice on what we should do at this point. Should we integrate our work (an alternate register pressure reduction scheduler) into the SD scheduler or the MI scheduler? > Please refer to my recent messages on llvmdev regarding enabling MI scheduling by default on x86.  http://article.gmane.org/gmane.comp.compilers.llvm.devel/63242/match=machinescheduler I suggest integrating with the MachineScheduler pass. There are many places to plug in. MachineSchedRegistry provides the hook. At that point you can define your own ScheduleDAGInstrs or ScheduleDAGMI subclass. People who only want to define new heuristics should reuse ScheduleDAGMI directly and only define their own MachineSchedStrategy. > >- Our SPEC testing on x86-64 has shown a significant performance improvement of LLVM 3.3 relative to LLVM 2.9 (about 5% in geomean on INT2006 and 15% in geomean on FP2006), but our spill code measurements have shown that LLVM 3.3 generates significantly more spill code on most benchmarks. We will be doing more investigation on this, but are there any known facts that explain this behavior? Is this caused by a known regression in scheduling and/or allocation (which I doubt) or by the implementation (or enabling) of some new optimization(s) that naturally increase(s) register pressure? > >There is not a particular known regression. It's not surprising that optimizations increase pressure. Andy Thank you in advance! > > >Ghassan Shobaki > >Assistant Professor > >Department of Computer Science > >Princess Sumaya University for Technology > >Amman, Jordan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Wed Jul 3 02:28:27 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Wed, 3 Jul 2013 11:28:27 +0200 Subject: [LLVMdev] construct new function Message-ID: What are the steps to construct a function of some basic blocks without any arguments and void return with new control flow graph that is different from the original function thanks *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jobnoorman at gmail.com Wed Jul 3 06:37:31 2013 From: jobnoorman at gmail.com (Job Noorman) Date: Wed, 03 Jul 2013 15:37:31 +0200 Subject: [LLVMdev] Problem selecting the correct registers for a calling convention In-Reply-To: References: Message-ID: <2917139.5TDfQTt6ff@squat> Hi Borja, Thanks a lot! Looking at your implementation in the AVR backend has helped me solve the problem for the MSP430. Regards, Job On Tuesday 02 July 2013 14:15:18 Borja Ferrer wrote: > Hello Job, > > I managed to resolve this same problem by using custom C++ code since as > you mentioned > the isSplit flag doesn't help here. There are 2 ways to analyze the > arguments of a function: > > 1) You can get a Function pointer in LowerFormalArguments, and in LowerCall > only when Callee can by dyn_casted to a GlobalAddressSDNode. By having this > pointer you can then do: > > for (Function::const_arg_iterator I = F->arg_begin(), E = F->arg_end();I != > E; ++I) > > { > unsigned Bytes = TD->getTypeSizeInBits(I->getType()) / 8; > // do stuff here > } > > > 2) The second case is when the dyn_cast above fails because the Callee > SDValue is a > > ExternalSymbolSDNode. In this case you have to manually analyze the > arguments using PartOffset. From justin.holewinski at gmail.com Wed Jul 3 06:50:35 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Wed, 3 Jul 2013 09:50:35 -0400 Subject: [LLVMdev] Tablegen bug??? In-Reply-To: <24855E2D-7E13-45A4-8B2B-78547C87B6B2@apple.com> References: <1FFC8463B1B7D945AB7511574438BE1F1DD0EF3C@sausexdag05.amd.com> <1FFC8463B1B7D945AB7511574438BE1F1DD0F063@sausexdag05.amd.com> <24855E2D-7E13-45A4-8B2B-78547C87B6B2@apple.com> Message-ID: Was a fix for this ever applied to trunk? On Fri, Nov 30, 2012 at 11:08 PM, Chris Lattner wrote: > Yes that definitely sounds like a bug, no intrinsically in mainline are a > prefix if another one, or we are getting lucky. > > -Chris > > On Nov 29, 2012, at 7:24 PM, "Relph, Richard" > wrote: > > > If the source being scanned has "llvm.AMDIL.barrier.global, it will > match the first barrier test and return AMDIL_barrier, not > AMDIL_barrier_global. > > > > > > On Nov 29, 2012, at 7:19 PM, Chris Lattner > > wrote: > > > >> Out of curiosity, what is wrong about that? It looks ok to me. > >> > >> -Chris > >> > >> On Nov 29, 2012, at 6:52 PM, "Relph, Richard" > wrote: > >> > >>> Should tablegen detect this as an error, or is it documented as a > limitation somewhere that we've missed? > >>> > >>> In the tablegen-generated file AMDILGenIntrinsics.inc, we have a > bunch of if statements comparing strings, many of which are dead, > preventing correct recognition of some intrinsics in the their text form. > I'm not quite sure what GET_FUNCTION_RECOGNIZER is used for, but if it's > used, it's broken… ;-) > >>> > >>> Here's a small segment: > >>> > >>> // Function name -> enum value recognizer code. > >>> #ifdef GET_FUNCTION_RECOGNIZER > >>> StringRef NameR(Name+6, Len-6); // Skip over 'llvm.' > >>> switch (Name[5]) { // Dispatch on first letter. > >>> default: break; > >>> case 'A': > >>> … > >>> if (NameR.startswith("MDIL.barrier.")) return > AMDILIntrinsic::AMDIL_barrier; > >>> if (NameR.startswith("MDIL.barrier.global.")) return > AMDILIntrinsic::AMDIL_barrier_global; > >>> if (NameR.startswith("MDIL.barrier.local.")) return > AMDILIntrinsic::AMDIL_barrier_local; > >>> if (NameR.startswith("MDIL.barrier.region.")) return > AMDILIntrinsic::AMDIL_barrier_region; > >>> … > >>> if (NameR.startswith("MDIL.fma.")) return AMDILIntrinsic::AMDIL_fma; > >>> if (NameR.startswith("MDIL.fma.rte.")) return > AMDILIntrinsic::AMDIL_fma_rte; > >>> if (NameR.startswith("MDIL.fma.rtn.")) return > AMDILIntrinsic::AMDIL_fma_rtn; > >>> if (NameR.startswith("MDIL.fma.rtp.")) return > AMDILIntrinsic::AMDIL_fma_rtp; > >>> if (NameR.startswith("MDIL.fma.rtz.")) return > AMDILIntrinsic::AMDIL_fma_rtz; > >>> … > >>> and several other similar instances. > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From xbaruc00 at stud.fit.vutbr.cz Wed Jul 3 06:20:28 2013 From: xbaruc00 at stud.fit.vutbr.cz (=?UTF-8?B?Um9iZXJ0IEJhcnXEjcOhaw==?=) Date: Wed, 03 Jul 2013 15:20:28 +0200 Subject: [LLVMdev] CallGraph in immutable pass Message-ID: <51D4251C.5020108@stud.fit.vutbr.cz> Hello, is there any way I can access CallGraph from immutable pass via getAnalysis? As I understand it, this may not be possible, because immutable pass don't have runOn method and is never actually planned. But I'm not 100% sure how this works, so I don't know if there is some other way. Thanks, Robert Barucak From chandlerc at google.com Wed Jul 3 09:18:27 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 3 Jul 2013 09:18:27 -0700 Subject: [LLVMdev] CallGraph in immutable pass In-Reply-To: <51D4251C.5020108@stud.fit.vutbr.cz> References: <51D4251C.5020108@stud.fit.vutbr.cz> Message-ID: Your understanding is correct -- this is impossible. What are you really trying to do? On Jul 3, 2013 9:15 AM, "Robert Baručák" wrote: > Hello, > is there any way I can access CallGraph from immutable pass via > getAnalysis? > As I understand it, this may not be possible, because immutable pass don't > have runOn method and is never actually planned. But I'm not 100% sure how > this works, so I don't know if there is some other way. > Thanks, > Robert Barucak > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From slarin at codeaurora.org Wed Jul 3 10:29:09 2013 From: slarin at codeaurora.org (Sergei Larin) Date: Wed, 3 Jul 2013 12:29:09 -0500 Subject: [LLVMdev] [LNT] Question about results reliability in LNT infrustructure In-Reply-To: <51D26112.30507@grosser.es> References: <601606fe.54cc.13f74d000d9.Coremail.tanmx_star@yeah.net> <51CC62CD.9060705@grosser.es> <3AD17769-0EBF-44EB-B84F-000E29131C3E@apple.com> <4BB1DA76-77CE-4FF1-8B89-45ED4810EDD3@apple.com> <429E8CF2-36A0-4546-B1DB-78929EEDB4A0@apple.com> <51cd57cb.aac6b40a.0fc4.0bc3SMTPIN_ADDED_BROKEN@mx.google.com> <51cd8a21.cc14b40a.1278.7259SMTPIN_ADDED_BROKEN@mx.google.com> <680FA65D-AF7C-44BC-8DCA-A018CF782609@apple.com> <51CF93AD.8060200@grosser.es> <51D261 12.30507@grosser.es> Message-ID: <006001ce7812$d4504d60$7cf0e820$@codeaurora.org> Tobias, I seem to trigger an assert in Polly lib/Analysis/TempScopInfo.cpp void TempScopInfo::buildAffineCondition(Value &V, bool inverted, Comparison **Comp) const { ... ICmpInst *ICmp = dyn_cast(&V); assert(ICmp && "Only ICmpInst of constant as condition supported!"); ... The code it chokes on looks like this (see below). The problem is this OR-ed compare result: %cmp3 = icmp sgt i32 %j.0, 2 %cmp5 = icmp eq i32 %j.0, 1 %or.cond13 = or i1 %cmp3, %cmp5 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< == Value V My question - is this a bug or a (missing) feature? ...and how it should be handled in theory? Thanks. Sergei define i32 @main() #0 { entry: %j.0.lcssa.reg2mem = alloca i32, align 8 br label %entry.split entry.split: ; preds = %entry %call = tail call i32 @foo(i32 0, i32 0) #2 %call1 = tail call i32 @foo(i32 %call, i32 %call) #2 br label %for.cond2 for.cond2: ; preds = %for.inc, %entry.split %j.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.inc ] %cmp3 = icmp sgt i32 %j.0, 2 %cmp5 = icmp eq i32 %j.0, 1 %or.cond13 = or i1 %cmp3, %cmp5 store i32 %j.0, i32* %j.0.lcssa.reg2mem, align 8 br i1 %or.cond13, label %for.end8, label %for.inc, !llvm.listen.preserve.while.opt !0 for.inc: ; preds = %for.cond2 %inc = add nsw i32 %j.0, 1 br label %for.cond2 for.end8: ; preds = %for.cond2 %j.0.lcssa.reload = load i32* %j.0.lcssa.reg2mem, align 8 %cmp10 = icmp eq i32 %j.0.lcssa.reload, 1 %add = add nsw i32 %j.0.lcssa.reload, 1 %retval.0 = select i1 %cmp10, i32 1, i32 %add ret i32 %retval.0 } --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From slarin at codeaurora.org Wed Jul 3 10:36:11 2013 From: slarin at codeaurora.org (Sergei Larin) Date: Wed, 3 Jul 2013 12:36:11 -0500 Subject: [LLVMdev] [Polly] Assert in Scope construction Message-ID: <006101ce7813$cf856260$6e902720$@codeaurora.org> Should have changed the subject line... --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Sergei Larin > Sent: Wednesday, July 03, 2013 12:29 PM > To: 'Tobias Grosser' > Cc: 'llvmdev' > Subject: Re: [LLVMdev] [LNT] Question about results reliability in LNT > infrustructure > > > Tobias, > > I seem to trigger an assert in Polly lib/Analysis/TempScopInfo.cpp > > void TempScopInfo::buildAffineCondition(Value &V, bool inverted, > Comparison **Comp) const { ... > ICmpInst *ICmp = dyn_cast(&V); > assert(ICmp && "Only ICmpInst of constant as condition supported!"); ... > > The code it chokes on looks like this (see below). The problem is this OR-ed > compare result: > > %cmp3 = icmp sgt i32 %j.0, 2 > %cmp5 = icmp eq i32 %j.0, 1 > %or.cond13 = or i1 %cmp3, %cmp5 > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< == > Value V > > > My question - is this a bug or a (missing) feature? ...and how it should be > handled in theory? > > Thanks. > > Sergei > > define i32 @main() #0 { > entry: > %j.0.lcssa.reg2mem = alloca i32, align 8 > br label %entry.split > > entry.split: ; preds = %entry > %call = tail call i32 @foo(i32 0, i32 0) #2 > %call1 = tail call i32 @foo(i32 %call, i32 %call) #2 > br label %for.cond2 > > for.cond2: ; preds = %for.inc, > %entry.split > %j.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.inc ] > %cmp3 = icmp sgt i32 %j.0, 2 > %cmp5 = icmp eq i32 %j.0, 1 > %or.cond13 = or i1 %cmp3, %cmp5 > store i32 %j.0, i32* %j.0.lcssa.reg2mem, align 8 > br i1 %or.cond13, label %for.end8, label %for.inc, > !llvm.listen.preserve.while.opt !0 > > for.inc: ; preds = %for.cond2 > %inc = add nsw i32 %j.0, 1 > br label %for.cond2 > > for.end8: ; preds = %for.cond2 > %j.0.lcssa.reload = load i32* %j.0.lcssa.reg2mem, align 8 > %cmp10 = icmp eq i32 %j.0.lcssa.reload, 1 > %add = add nsw i32 %j.0.lcssa.reload, 1 > %retval.0 = select i1 %cmp10, i32 1, i32 %add > ret i32 %retval.0 > } > > > --- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nazanin.calagar at gmail.com Wed Jul 3 11:00:36 2013 From: nazanin.calagar at gmail.com (Nazanin Calagar) Date: Wed, 3 Jul 2013 14:00:36 -0400 Subject: [LLVMdev] getting source-level debug info Message-ID: Hello, I have a problem getting source-level debug info using Clang and LLVM 2.9 libraries. I compile the code with -g and -O3 (or even -O1) and I'm receiving the following error: LLVM ERROR: Code generator does not support intrinsic function 'llvm.dbg.value'! make: *** [all] Error 1 Which I found is in IntrinsicLowering.cpp and that's because there isn't any case for llvm.dbg.value. I want to ask if I'm missing something ? Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Wed Jul 3 11:15:50 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 3 Jul 2013 11:15:50 -0700 Subject: [LLVMdev] getting source-level debug info In-Reply-To: References: Message-ID: On Wed, Jul 3, 2013 at 11:00 AM, Nazanin Calagar wrote: > Hello, > > I have a problem getting source-level debug info using Clang and LLVM 2.9 > libraries. 2.9 is a rather old release in our time frame - and essentially unsupported. Someone might know something about this, but there's a fair chance if anyone on the project knew we've long since forgotten. > I compile the code with -g and -O3 (or even -O1) and I'm > receiving the following error: > LLVM ERROR: Code generator does not support intrinsic function > 'llvm.dbg.value'! > make: *** [all] Error 1 > > Which I found is in IntrinsicLowering.cpp and that's because there isn't any > case for llvm.dbg.value. > I want to ask if I'm missing something ? > > Thank you. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From swlin at post.harvard.edu Wed Jul 3 15:05:15 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Wed, 3 Jul 2013 15:05:15 -0700 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass Message-ID: Hi, I'm not planning on doing this, but I noticed that the documentation in WritingAnLLVMPass.rst doesn't seem to specify whether or not it's legal for a function pass to inspect (and thus depend upon the contents of) other functions and I'm curious if this is an oversight or by design: To be explicit, ``FunctionPass`` subclasses are not allowed to: #. Modify a ``Function`` other than the one currently being processed. ... Whereas for basic block passes there is an explicit prohibition: ``BasicBlockPass``\ es are just like :ref:`FunctionPass's ` , except that they must limit their scope of inspection and modification to a single basic block at a time. As such, they are **not** allowed to do any of the following: #. Modify or inspect any basic blocks outside of the current one. ... Does anyone know if there's a defined policy about this, either way? If so, I think it ought to be noted in the docs, for consistency. Stephen From silvas at purdue.edu Wed Jul 3 15:56:10 2013 From: silvas at purdue.edu (Sean Silva) Date: Wed, 3 Jul 2013 15:56:10 -0700 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: References: Message-ID: On Wed, Jul 3, 2013 at 3:05 PM, Stephen Lin wrote: > > If so, I think it ought to be noted in the docs, for consistency. > Once you get your answer, definitely feel free to update the docs. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From stoklund at 2pi.dk Wed Jul 3 17:05:10 2013 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Wed, 03 Jul 2013 17:05:10 -0700 Subject: [LLVMdev] EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes are gone Message-ID: <69B24032-44D9-4C3D-AB45-60774BE48A09@2pi.dk> All, I just committed r185596 which removes some exception-related ISD opcodes. If you have an out-of-tree target that supports DWARF exception handling, that probably broke your build. Just delete these lines from your XXXISelLowering.cpp file: - setOperationAction(ISD::EHSELECTION, MVT::i32, Expand); - setOperationAction(ISD::EXCEPTIONADDR, MVT::i32, Expand); The lowering code for DWARF landing pads now only needs these parameters: setExceptionPointerRegister(X86::EAX); setExceptionSelectorRegister(X86::EDX); Thanks, /jakob From luoyonggang at gmail.com Thu Jul 4 00:43:15 2013 From: luoyonggang at gmail.com (=?UTF-8?B?572X5YuH5YiaKFlvbmdnYW5nIEx1bykg?=) Date: Thu, 4 Jul 2013 15:43:15 +0800 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: References: Message-ID: LLVM is a modularized software system, so I hope it's was modularized, And ELF.h is definitely belongs to Object by classification, The things that confused me is the ELF.h was placed under Support and don't know why. Indeed, I checked that those source codes that include ELF.h also include files under Object folder, so that's the reason to move ELF.h from Support to Object. 2013/6/29 Sean Silva : > On Fri, Jun 28, 2013 at 11:13 AM, Eric Christopher > wrote: >> >> Going to be interesting layering issues if you do the latter. Then you >> have CodeGen depending upon DebugInfo instead of just a header in >> Support. > > > Well, the issue is that LLVM's "libraries" are really not fine grained due > to our build system/source tree layout, so we end up just glomming together > large pieces of (sometimes vaguely) related functionality into "libraries", > which are the units of physical dependency. Realistically, Support/ELF.h is > a fine piece of independent functionality that should be independently > reusable, but we don't have effective tools for managing and maintaining > proper physical dependencies to make that happen. > > There are actually a bunch of things in Support that I wish could be > independently reused. Like if I want to write up a little program, I really > wish I could do > > $ llpm install StringRef ArrayRef raw_ostream MemoryBuffer > > and then have it put those relevant modules and their dependencies in a > local `deps/` (or whatever) directory so that I can then just include them > in my build. > > -- Sean Silva -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo From David.Chisnall at cl.cam.ac.uk Thu Jul 4 01:45:27 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Thu, 4 Jul 2013 09:45:27 +0100 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: References: Message-ID: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> On 3 Jul 2013, at 23:05, Stephen Lin wrote: > Does anyone know if there's a defined policy about this, either way? > If so, I think it ought to be noted in the docs, for consistency. The prohibition exists, at least in part, because in theory it would be nice to be able to run passes in parallel. It's not a real limitation at the moment because updating instructions in a module is not thread safe (and making it so with the current APIs would probably be somewhat problematic in terms of performance) and so when we do eventually get the ability to run FunctionPasses in parallel they will most likely need new APIs. That said, it's a good idea structurally to view the Function / Block as synchronisation boundaries so that it will be easier to support concurrent execution in the future. David From xbaruc00 at stud.fit.vutbr.cz Thu Jul 4 02:09:28 2013 From: xbaruc00 at stud.fit.vutbr.cz (=?UTF-8?B?Um9iZXJ0IEJhcnXEjcOhaw==?=) Date: Thu, 04 Jul 2013 11:09:28 +0200 Subject: [LLVMdev] CallGraph in immutable pass In-Reply-To: References: <51D4251C.5020108@stud.fit.vutbr.cz> Message-ID: <51D53BC8.8020801@stud.fit.vutbr.cz> On 07/03/2013 06:18 PM, Chandler Carruth wrote: > > Your understanding is correct -- this is impossible. What are you > really trying to do? > I'm working on implementation of some fancier alias analysis algorithm. I have experienced strange behavior when I registered my AA (as module pass) into AA group. Somehow I was unable to get correct DataLayout from AA interface. So I wanted to try to make it immutable, just like other AA implementations. Anyway, thanks for clarification. From chandlerc at google.com Thu Jul 4 02:13:29 2013 From: chandlerc at google.com (Chandler Carruth) Date: Thu, 4 Jul 2013 02:13:29 -0700 Subject: [LLVMdev] CallGraph in immutable pass In-Reply-To: <51D53BC8.8020801@stud.fit.vutbr.cz> References: <51D4251C.5020108@stud.fit.vutbr.cz> <51D53BC8.8020801@stud.fit.vutbr.cz> Message-ID: On Thu, Jul 4, 2013 at 2:09 AM, Robert Baručák wrote: > On 07/03/2013 06:18 PM, Chandler Carruth wrote: > >> >> Your understanding is correct -- this is impossible. What are you really >> trying to do? >> >> I'm working on implementation of some fancier alias analysis algorithm. > I have experienced strange behavior when I registered my AA (as module > pass) into AA group. Somehow I was unable to get correct DataLayout from AA > interface. So I wanted to try to make it immutable, just like other AA > implementations. > Anyway, thanks for clarification. > Currently, LLVM's pass manager infrastructure and especially the immutable pass based alias analysis group makes stateful alias analyses essentially impossible to do well, and largely require gross hacks. There is a thread I started many months ago about revamping the pass management in LLVM, and this is one motivating concern although not my primary concern. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdavis5x at gmail.com Thu Jul 4 05:53:04 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Thu, 4 Jul 2013 06:53:04 -0600 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: References: Message-ID: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> On Jul 4, 2013, at 1:43 AM, 罗勇刚(Yonggang Luo) wrote: > LLVM is a modularized software system, so I hope it's was modularized, > And ELF.h is definitely belongs to Object by classification, > The things that confused me is the ELF.h was placed under Support and > don't know why. Because it's also used by the MC layer's direct object emission support. Chip From fernleaf07 at gmail.com Thu Jul 4 09:23:12 2013 From: fernleaf07 at gmail.com (Keith Smith) Date: Thu, 4 Jul 2013 12:23:12 -0400 Subject: [LLVMdev] Cygwin, configure and mmap Message-ID: Hello list. I am attempting to build llvm, clang, compile-rt, and lldb using Cygwin. (don't ask) I am using the 3.3 tag. I have the following problems. 1 - Running configure, PATH=/usr/bin CC=/bin/gcc CXX=/bin/g++ ../llvm-3-3/configure --enable-optimized I see the following message: configure: WARNING: mmap() of a fixed address required but not supported I have seen various postings on the Web that there is an issue with configure, Cygwin and mmap(). I didn't see this issue in the FAQs. Is this warning a concern? From benoit.noe.perrot at gmail.com Thu Jul 4 00:48:23 2013 From: benoit.noe.perrot at gmail.com (Benoit Perrot) Date: Thu, 4 Jul 2013 09:48:23 +0200 Subject: [LLVMdev] llvm (hence Clang) not compiling with Visual Studio 2008 Message-ID: Hello, I have just updated my svn copy of the llvm/clang repositories after quite a long time of inactivity, and found it not compiling on Windows with Visual Studio 2008. The incriminated file is: llvm/lib/MC/MCModule.cpp Where several calls to "std::lower_bound" are made, like: atom_iterator I = std::lower_bound(atom_begin(), atom_end(), Begin, AtomComp); With: - "atom_iterator" being a typedef on "std::vector::iterator" - "atom_begin()" and "atom_end" returning an "atom_iterator" - "Begin" being an "uint64_t" - "AtomComp" being a predicate of type "bool (const llvm::MCAtom *,uint64_t)" This seems to be due to an invalid implementation of the STL as provided with Visual Studio 2008. Indeed, the predicate given to "lower_bound" must respect the following rules: - obviously, it shall return "bool" (here: of course) - its first argument shall be of a type into which the type of the dereferenced iterators can be implicitly converted (here: "atom_iterator::operator*" returns a "llvm::Atom*", and the first argument of "AtomComp" is also "llvm::Atom*" - its second argument shall be of a type into which the type of the value can be implicitly converted (here: "Begin" is an "uint_64_t", as well as the second argument of "AtomComp") But the implementation of "std::lower_bound" in Visual Stuio 2008 relies on a checker provided by the file "xutility", which reads: template inline bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, _Ty1& _Left, const _Ty2& _Right, const wchar_t *_Where, unsigned int _Line) { // test if _Pred(_Left, _Right) and _Pred is strict weak ordering if (!_Pred(_Left, _Right)) return (false); else if (_Pred(_Right, _Left)) _DEBUG_ERROR2("invalid operator<", _Where, _Line); return (true); } Hence, it expects the predicate (here "_Pred") to accept as arguments both (_Ty1, _Ty2) and (_Ty2, _Ty1), which does not seem consistent with the specifications mentioned above. Solutions here: 1. consider that the implementation if effectively wrong, and modify the documentation at http://llvm.org/docs/GettingStartedVS.html, requiring Visual Studio 2010, i.e. replacing: "You will need Visual Studio 2008 or higher." by: "You will need Visual Studio 2010 or higher." Same comments on the respect of the standard apply 2. investigate whether there exists a way to disable the aforementioned check; 3. modify the code in MCModule.cpp to cope with the implementation of "lower_bound" in VS 2008. Personally I just went for (1), i.e. switching to Visual Studio 2010, as it was the most straightforward. Doing so, I also had to add "#include " to the file "lib/CodeGen/CGBlocks.cpp" so that llvm/clang can compile with said compiler, because of some obscure external template usage. Regards, -- Benoit PERROT -------------- next part -------------- An HTML attachment was scrubbed... URL: From swlin at post.harvard.edu Thu Jul 4 11:21:38 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 4 Jul 2013 11:21:38 -0700 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> References: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> Message-ID: On Thu, Jul 4, 2013 at 1:45 AM, David Chisnall wrote: > On 3 Jul 2013, at 23:05, Stephen Lin wrote: > >> Does anyone know if there's a defined policy about this, either way? >> If so, I think it ought to be noted in the docs, for consistency. > > The prohibition exists, at least in part, because in theory it would be nice to be able to run passes in parallel. It's not a real limitation at the moment because updating instructions in a module is not thread safe (and making it so with the current APIs would probably be somewhat problematic in terms of performance) and so when we do eventually get the ability to run FunctionPasses in parallel they will most likely need new APIs. That said, it's a good idea structurally to view the Function / Block as synchronisation boundaries so that it will be easier to support concurrent execution in the future. > I understand the rationale but are you sure that the prohibition against *inspecting* other functions during a function pass does exist and is currently followed? If it does I think the docs ought to make that clear so I want to make sure if the omission is not deliberate. In theory you could still parallelize function pass execution if they inspected other functions if they used some kind of read/write locking and used transactional updates; I would think the main point is that we want the results to be deterministic and not dependent on the order in which functions are processed, which applies regardless of what kind of parallelization and/or synchronization is used. Stephen From robert at xmos.com Thu Jul 4 11:36:49 2013 From: robert at xmos.com (Robert Lytton) Date: Thu, 4 Jul 2013 18:36:49 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame Message-ID: Hi - help! I have read through previous threads on the subject of 'byval' e.g. https://groups.google.com/forum/#!topicsearchin/llvm-dev/Exact$20meaning$20of$20byval/llvm-dev/cyRZyXcMCNI https://groups.google.com/forum/#!topicsearchin/llvm-dev/$20byval/llvm-dev/uk4uiK93jeM https://groups.google.com/forum/#!topicsearchin/llvm-dev/byval/llvm-dev/46Tv0lSRwBg and read through code (as best I can) but I am no wiser. I am using the XCore target where the pointee data needs to be copied by the callee (not the caller). So: > I am not sure what this means though - when I generate code > from the LLVM assembly, do I need to do anything with byval? yes, the pointee needs to be passed by-copy, which usually means on the stack but could mean in a bunch of registers. > Either in the calling location or in the called function? The caller does the copy IIRC. If you look at the .s file you should see it happening. unfortunately does not help me. There seems to be some disagreement if it should be done in clang or llvm. Indeed I have hacked clang's CodeGenFunction::EmitFunctionProlog() and it works - but it is not nice. BUT it seems most believe that it should be done within llvm using 'byVal'. I have tried to follow the the 'byval' flag but am too ignorant to make any meaningful headway viz: I tried adding to the XCoreCallingConv.td: CCIfByVal> // pushes pointer to the stack and CCIfByVal> But Have got stuck knowing if I can add the copy at this stage. Is the pointee's details available or only the pointer's? (Sorry if I have not dug deep enough to trace from pointer to pointee) I also started to looked at the XCoreFrameLowering::emitPrologue() Unfortunately, whilst stumbling around in the code is an interesting way to see the scenery, I am losing any sense of direction I may have had to start with! Any input gratefully recieved - including key source files or the order in which things happen. I'll try tracing through the calls (will -debug be enough) tomorrow. thank you robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Thu Jul 4 12:24:38 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 4 Jul 2013 20:24:38 +0100 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: Message-ID: Hi Robert, > I tried adding to the XCoreCallingConv.td: > CCIfByVal> // pushes pointer to the stack This looks sensible to me. After that it comes down to cooperation between XCoreISelLowering's LowerFormalArguments and LowerCall functions. LowerFormalArguments is at the beginning of a function and is responsible for taking arguments out of registers and putting them into sensible places for the rest of the function to use. LowerCall is responsible for putting call arguments where callees will expect them and making the call. On most targets, for byval, LowerCall would store the argument by value on the stack (likely with a memcpy equivalent from the actual pointer that's being passed); and LowerFormalArguments would create a fixed FrameIndex pointing there and record that as the address for use by everything else. You'll want to do basically the reverse: LowerCall will just put the pointer it's given on the stack; LowerFormalArguments will do the memcpy like operation into a local variable created for the purpose (also a FrameIndex, but of a different kind), then it'll record that frame-index as the address for everything else to use. Hope this helps; come back if there's still stuff you don't understand. Cheers. Tim. From letz at grame.fr Thu Jul 4 13:39:34 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Thu, 4 Jul 2013 22:39:34 +0200 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR Message-ID: Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. Any idea of what could be lacking? Thanks Stéphane Letz From robert at xmos.com Thu Jul 4 13:43:10 2013 From: robert at xmos.com (Robert Lytton) Date: Thu, 4 Jul 2013 20:43:10 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: , Message-ID: Hi Tim, Thank you for the input. I think I follow you. I believe the LowerCall is doing what it needs to do - passing pointer either on the stack or in register as per ABI. The LowerFormalArguments() is where I am stuck. LowerFormalArguments () calls CCInfo.AnalyzeFormalArguments(Ins, CC_XCore), which calls the CC_XCore(). This is where I placed the CCIfByVal> which only pushed the pointer to the stack. However, I don't want to push the pointer to the stack but COPY the pointee. Indeed, I want to keep the pointer where it is BUT re-point it to a new object copied onto the callee's stack. Hmmm What I really want it something like: static bool CC_XCore(...) { if (ArgFlags.isByVal()) { // a 'CCCustom' function Size = ValNo.pointee.size; // where do I get the pointee's info from? NewVal = State.AllocateStack( Size, 4); // how do I create space on the callee stack? memcpy(NewVal, ValNo.pointee, Size); // how do I copy from caller's stack to callee's stack? ValNo.pointee=NewVal; // how re-point the pointer at the callee's stack (rather than caller's data)? } If this is the correct way to do it, I will continue to plough ahead. As you can see, I have no idea what api functions I should be using :-) Thank you robert ________________________________________ From: Tim Northover [t.p.northover at gmail.com] Sent: 04 July 2013 20:24 To: Robert Lytton Cc: Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi Robert, > I tried adding to the XCoreCallingConv.td: > CCIfByVal> // pushes pointer to the stack This looks sensible to me. After that it comes down to cooperation between XCoreISelLowering's LowerFormalArguments and LowerCall functions. LowerFormalArguments is at the beginning of a function and is responsible for taking arguments out of registers and putting them into sensible places for the rest of the function to use. LowerCall is responsible for putting call arguments where callees will expect them and making the call. On most targets, for byval, LowerCall would store the argument by value on the stack (likely with a memcpy equivalent from the actual pointer that's being passed); and LowerFormalArguments would create a fixed FrameIndex pointing there and record that as the address for use by everything else. You'll want to do basically the reverse: LowerCall will just put the pointer it's given on the stack; LowerFormalArguments will do the memcpy like operation into a local variable created for the purpose (also a FrameIndex, but of a different kind), then it'll record that frame-index as the address for everything else to use. Hope this helps; come back if there's still stuff you don't understand. Cheers. Tim. From schnetter at cct.lsu.edu Thu Jul 4 14:00:59 2013 From: schnetter at cct.lsu.edu (Erik Schnetter) Date: Thu, 4 Jul 2013 17:00:59 -0400 Subject: [LLVMdev] round() vs. rint()/nearbyint() with fast-math In-Reply-To: References: <732709722.6092840.1371663865241.JavaMail.root@alcf.anl.gov> <453148523.6096744.1371664589597.JavaMail.root@alcf.anl.gov> <51c43f8c.41a42a0a.4ecb.3739SMTPIN_ADDED_BROKEN@mx.google.com> Message-ID: On Fri, Jun 21, 2013 at 5:11 PM, Erik Schnetter wrote: > On Fri, Jun 21, 2013 at 7:54 AM, David Tweed wrote: > >> | LLVM does not currently have special lowering handling for round(), and >> I'll propose a patch to add that, but the larger question is this: should >> fast-math change the tie-breaking behavior of >> | rint/nearbyint/round, etc. and, if so, should we make a specific effort >> to >> have all backends provide the same guarantee (or lack of a guarantee) in >> this regard? >> >> I don't know, primarily because I've never really been involved in >> anything >> where I've cared about using exotic rounding modes. But in general I'm of >> the opinion that -fast-math is the "nuclear option" that's allowed to do >> lots of things which may well invoke backend specific behaviour. (That's >> also why I think that most FP transformations shouldn't be "only" guarded >> by >> fast-math but a more precise option.) > > > The functions rint and round and standard libm functions commonly used to > round floating point values to integers. Both round to the nearest integer, > but break ties differently -- rint uses IEEE tie breaking (towards even), > round uses mathematical tie breaking (away from zero). > > The question here is: Is this optimization worthwhile, or would it > surprise too many people? Depending on this, it should either be > disallowed, or possibly implemented for other back-ends as well. > After some consideration, I have come to the conclusion that this optimization (changing rint to round) is not worthwhile. There are some floating point operations that can provide an exact result, and not obtaining this exact result is surprising. For example, I would expect that adding/multiplying two small integers gives the exact result, or that fmin/fmax give the correct result if no nans are involved, or that comparisons yield the correct answer (again in the absence of nans, denormalized numbers etc.). The case here -- rint(0.5) -- involves an input that can be represented exactly, and an output that can be represented exactly (0.0). Neither nans, infinities, nor denormalized numbers are involved. In this case I do expect the correct answer, even with full floating point operations that ignore nans, infinities, denormalized numbers, or that re-associate etc. -erik PS: I think that rint(x) = x + copysign(M,x) - copysign(M,x) where M is a magic number, and where the addition and subtraction cannot be optimized. I believe M=2^52. This should work fine at least for "reasonably small" numbers. -- Erik Schnetter http://www.perimeterinstitute.ca/personal/eschnetter/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Thu Jul 4 14:25:10 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 4 Jul 2013 22:25:10 +0100 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: Message-ID: Hi, > I believe the LowerCall is doing what it needs to do - passing pointer either on the stack or in register as per ABI. >From very quick test-cases with no understanding of XCore, that looks plausible. > LowerFormalArguments () calls CCInfo.AnalyzeFormalArguments(Ins, CC_XCore), which calls the CC_XCore(). > This is where I placed the CCIfByVal> which only pushed the pointer to the stack. Really, all it did was ask LowerCall and LowerFormalArguments to pass the pointer on the stack (well, strictly "ByVal") as they see fit. > static bool CC_XCore(...) { > > } I think you're misinterpreting the purpose of these CC_* functions. They don't actually do any of the work themselves. Their job is to decide in broad terms where an argument goes and to record that decision for the LowerWhatever functions. In fact, they don't have access to any of the CodeGen or SelectionDAG machinery necessary to do the job themselves. The idea is that the DAG nodes we need to produce are actually different in caller and callee, but whether some argument goes in R0 or R1 (or a stack slot) should hopefully be the same. So the CC_* functions (& TableGen) take care of the first bit and then the Lower* functions interpret those results in a target-specific way (often). There are usually special checks for byval in those functions which make copies at the appropriate points. > As you can see, I have no idea what api functions I should be using :-) I *think* you'll be fine with using CCPassByVal and won't need a custom handler from what I've heard. At the very least you should be focussing your attention on these Lower* functions for now. They're horribly complicated, but I'm afraid that's something you just have to deal with. They're where the real meaning of "byval" is decided. Cheers. Tim. From robert at xmos.com Thu Jul 4 14:48:46 2013 From: robert at xmos.com (Robert Lytton) Date: Thu, 4 Jul 2013 21:48:46 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: , Message-ID: Hi Tim, I may be missing something but using CCPassByVal is moving the pointer onto the stack - not what I'm after. I need to add an operation to the function prolog that actually makes a copy of the pointed to data. It is the responsibility of the callee to make the copy, not the caller - hence my trouble. (currently the callee can corrupt the original data viz pass-by-reference!) This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. FYI, the clang hack is - notice the "CreateMemCpy": CodeGenFunction::EmitFunctionProlog(){ ... if (ArgI.getIndirectByVal()) { llvm::AllocaInst *ByValue = CreateMemTemp(Ty, V->getName() + "agg.tmp"); llvm::ConstantInt * size = llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(Ty).getQuantity()); Builder.CreateMemCpy(ByValue, V, size, 4 ); V = ByValue; // and point the pointer to the new pointee! } So, can it be done in the llvm? Should it be done in the llvm? I probably need to go away and read up on the llvm AST handling (is that the right name?). robert _______________________________________ From: Tim Northover [t.p.northover at gmail.com] Sent: 04 July 2013 22:25 To: Robert Lytton Cc: Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi, > I believe the LowerCall is doing what it needs to do - passing pointer either on the stack or in register as per ABI. >From very quick test-cases with no understanding of XCore, that looks plausible. > LowerFormalArguments () calls CCInfo.AnalyzeFormalArguments(Ins, CC_XCore), which calls the CC_XCore(). > This is where I placed the CCIfByVal> which only pushed the pointer to the stack. Really, all it did was ask LowerCall and LowerFormalArguments to pass the pointer on the stack (well, strictly "ByVal") as they see fit. > static bool CC_XCore(...) { > > } I think you're misinterpreting the purpose of these CC_* functions. They don't actually do any of the work themselves. Their job is to decide in broad terms where an argument goes and to record that decision for the LowerWhatever functions. In fact, they don't have access to any of the CodeGen or SelectionDAG machinery necessary to do the job themselves. The idea is that the DAG nodes we need to produce are actually different in caller and callee, but whether some argument goes in R0 or R1 (or a stack slot) should hopefully be the same. So the CC_* functions (& TableGen) take care of the first bit and then the Lower* functions interpret those results in a target-specific way (often). There are usually special checks for byval in those functions which make copies at the appropriate points. > As you can see, I have no idea what api functions I should be using :-) I *think* you'll be fine with using CCPassByVal and won't need a custom handler from what I've heard. At the very least you should be focussing your attention on these Lower* functions for now. They're horribly complicated, but I'm afraid that's something you just have to deal with. They're where the real meaning of "byval" is decided. Cheers. Tim. From ahmed.bougacha at gmail.com Thu Jul 4 16:43:13 2013 From: ahmed.bougacha at gmail.com (Ahmed Bougacha) Date: Thu, 4 Jul 2013 16:43:13 -0700 Subject: [LLVMdev] [cfe-dev] llvm (hence Clang) not compiling with Visual Studio 2008 In-Reply-To: References: Message-ID: On Thu, Jul 4, 2013 at 12:48 AM, Benoit Perrot wrote: > Hello, > Hi Benoit, > I have just updated my svn copy of the llvm/clang repositories after quite > a long time of inactivity, and found it not compiling on Windows with > Visual Studio 2008. > > The incriminated file is: > > llvm/lib/MC/MCModule.cpp > > Where several calls to "std::lower_bound" are made, like: > > atom_iterator I = std::lower_bound(atom_begin(), atom_end(), > Begin, AtomComp); > > With: > - "atom_iterator" being a typedef on "std::vector::iterator" > - "atom_begin()" and "atom_end" returning an "atom_iterator" > - "Begin" being an "uint64_t" > - "AtomComp" being a predicate of type "bool (const llvm::MCAtom > *,uint64_t)" > > This seems to be due to an invalid implementation of the STL as provided > with Visual Studio 2008. > > Indeed, the predicate given to "lower_bound" must respect the following > rules: > > - obviously, it shall return "bool" (here: of course) > > - its first argument shall be of a type into which the type of the > dereferenced iterators can be implicitly converted (here: > "atom_iterator::operator*" returns a "llvm::Atom*", and the first argument > of "AtomComp" is also "llvm::Atom*" > > - its second argument shall be of a type into which the type of the > value can be implicitly converted (here: "Begin" is an "uint_64_t", as well > as the second argument of "AtomComp") > > But the implementation of "std::lower_bound" in Visual Stuio 2008 relies > on a checker provided by the file "xutility", which reads: > > template inline > bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, > _Ty1& _Left, > const _Ty2& _Right, > const wchar_t *_Where, > unsigned int _Line) > { // test if _Pred(_Left, _Right) and _Pred is strict weak ordering > > if (!_Pred(_Left, _Right)) > return (false); > else if (_Pred(_Right, _Left)) > _DEBUG_ERROR2("invalid operator<", _Where, _Line); > > return (true); > } > > Hence, it expects the predicate (here "_Pred") to accept as arguments > both (_Ty1, _Ty2) and (_Ty2, _Ty1), which does not seem consistent with the > specifications mentioned above. > > Solutions here: > > 1. consider that the implementation if effectively wrong, and modify the > documentation at http://llvm.org/docs/GettingStartedVS.html, requiring > Visual Studio 2010, i.e. replacing: > > "You will need Visual Studio 2008 or higher." > > by: > > "You will need Visual Studio 2010 or higher." > > Same comments on the respect of the standard apply > > 2. investigate whether there exists a way to disable the aforementioned > check; > > 3. modify the code in MCModule.cpp to cope with the implementation of > "lower_bound" in VS 2008. > > Personally I just went for (1), i.e. switching to Visual Studio 2010, as > it was the most straightforward. > (3) Fixed in r185676. Requiring VS 2010 for a minor problem like this (even though there are more like it) isn’t warranted I think. Doing so, I also had to add "#include " to the file > "lib/CodeGen/CGBlocks.cpp" so that llvm/clang can compile with said > compiler, because of some obscure external template usage. > is already included, at least by StringRef.h, so I’m curious: what is this obscure thing that needs including it again? Thanks, -- Ahmed Bougacha > Regards, > -- > Benoit PERROT > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrew.Jost at synopsys.com Thu Jul 4 16:39:41 2013 From: Andrew.Jost at synopsys.com (Andy Jost) Date: Thu, 4 Jul 2013 23:39:41 +0000 Subject: [LLVMdev] Kaleidoscope Tutorial is Out of Date Message-ID: I'm working thought the Kaleidoscope tutorials for LLVM 3.3 and noticed the code listing for chapter 4 is out of date on the web. Take a look at http://llvm.org/releases/3.3/docs/tutorial/LangImpl4.html. The first line includes llvm/DerivedTypes.h, but this does not compile. The correct path is llvm/IR/Derivedtypes.h. There are other differences, and the examples that I pulled along with the source are correct. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Thu Jul 4 17:06:40 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Thu, 4 Jul 2013 19:06:40 -0500 (CDT) Subject: [LLVMdev] round() vs. rint()/nearbyint() with fast-math In-Reply-To: Message-ID: <343771953.9941360.1372982800120.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > On Fri, Jun 21, 2013 at 5:11 PM, Erik Schnetter < > schnetter at cct.lsu.edu > wrote: > > > > > > > On Fri, Jun 21, 2013 at 7:54 AM, David Tweed < david.tweed at arm.com > > wrote: > > > > > > > | LLVM does not currently have special lowering handling for round(), > | and > I'll propose a patch to add that, but the larger question is this: > should > fast-math change the tie-breaking behavior of > | rint/nearbyint/round, etc. and, if so, should we make a specific > | effort to > have all backends provide the same guarantee (or lack of a guarantee) > in > this regard? > > I don't know, primarily because I've never really been involved in > anything > where I've cared about using exotic rounding modes. But in general > I'm of > the opinion that -fast-math is the "nuclear option" that's allowed to > do > lots of things which may well invoke backend specific behaviour. > (That's > also why I think that most FP transformations shouldn't be "only" > guarded by > fast-math but a more precise option.) > > > The functions rint and round and standard libm functions commonly > used to round floating point values to integers. Both round to the > nearest integer, but break ties differently -- rint uses IEEE tie > breaking (towards even), round uses mathematical tie breaking (away > from zero). > > > The question here is: Is this optimization worthwhile, or would it > surprise too many people? Depending on this, it should either be > disallowed, or possibly implemented for other back-ends as well. > > > After some consideration, I have come to the conclusion that this > optimization (changing rint to round) is not worthwhile. There are > some floating point operations that can provide an exact result, and > not obtaining this exact result is surprising. For example, I would > expect that adding/multiplying two small integers gives the exact > result, or that fmin/fmax give the correct result if no nans are > involved, or that comparisons yield the correct answer (again in the > absence of nans, denormalized numbers etc.). > > > The case here -- rint(0.5) -- involves an input that can be > represented exactly, and an output that can be represented exactly > (0.0). Neither nans, infinities, nor denormalized numbers are > involved. In this case I do expect the correct answer, even with > full floating point operations that ignore nans, infinities, > denormalized numbers, or that re-associate etc. I've been thinking about this for some time as well, and I've come to the same conclusion. I'll be updating the PPC backend accordingly in the near future. frin should really map to round() and not rint(), and we should leave it at that. Thanks again, Hal > > > -erik > > > PS: > > > I think that > > > rint(x) = x + copysign(M,x) - copysign(M,x) > > > where M is a magic number, and where the addition and subtraction > cannot be optimized. I believe M=2^52. This should work fine at > least for "reasonably small" numbers. > > -- > Erik Schnetter < schnetter at cct.lsu.edu > > http://www.perimeterinstitute.ca/personal/eschnetter/ -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From clinghacker at gmail.com Thu Jul 4 18:28:09 2013 From: clinghacker at gmail.com (hacker cling) Date: Fri, 5 Jul 2013 09:28:09 +0800 Subject: [LLVMdev] Any suggestion for "Unknown instruction type encountered" error? Message-ID: Hello all, I was playing with LLVM pass. I changed the lib/Transforms/Hello/Hello.cpp 's content to be my own pass. Then I make install the pass and use an example test1.c to see whether it works or not. When I run example using the following command: clang -emit-llvm test1.c -c -o test1.bc opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello < test1.bc > /dev/null It shows the following error: Unknown instruction type encountered! UNREACHABLE executed at include/llvm/InstVisitor.h:120! 0 opt 0x00000000014190b6 llvm::sys::PrintStackTrace(_IO_FILE*) + 38 1 opt 0x0000000001419333 2 opt 0x0000000001418d8b 3 libpthread.so.0 0x0000003aa600f500 4 libc.so.6 0x0000003aa5c328a5 gsignal + 53 5 libc.so.6 0x0000003aa5c34085 abort + 373 6 opt 0x000000000140089b 7 LLVMHello.so 0x00007f889beb5833 8 LLVMHello.so 0x00007f889beb57bd 9 LLVMHello.so 0x00007f889beb575e 10 LLVMHello.so 0x00007f889beb56c5 11 LLVMHello.so 0x00007f889beb55f2 12 LLVMHello.so 0x00007f889beb5401 13 opt 0x00000000013a4e21 llvm::FPPassManager::runOnFunction(llvm::Function&) + 393 14 opt 0x00000000013a5021 llvm::FPPassManager::runOnModule(llvm::Module&) + 89 15 opt 0x00000000013a5399 llvm::MPPassManager::runOnModule(llvm::Module&) + 573 16 opt 0x00000000013a59a8 llvm::PassManagerImpl::run(llvm::Module&) + 254 17 opt 0x00000000013a5bbf llvm::PassManager::run(llvm::Module&) + 39 18 opt 0x000000000084b455 main + 5591 19 libc.so.6 0x0000003aa5c1ecdd __libc_start_main + 253 20 opt 0x000000000083d359 Stack dump: 0. Program arguments: opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello 1. Running pass 'Function Pass Manager' on module ''. 2. Running pass 'Hello Pass' on function '@main' I will illustrate the pass code, the test1 example, and the IR generated below, so that anyone could help me or give me some suggestion. Thanks. The Hello.cpp pass is as the following: #define DEBUG_TYPE "hello" #include "llvm/Pass.h" #include "llvm/IR/Module.h" #include "llvm/InstVisitor.h" #include "llvm/IR/Constants.h" #include "llvm/IR/IRBuilder.h" #include "llvm/Support/raw_ostream.h" namespace { struct Hello : public llvm::FunctionPass, llvm::InstVisitor { private: llvm::BasicBlock *FailBB; public: static char ID; Hello() : llvm::FunctionPass(ID) {FailBB = 0;} virtual bool runOnFunction(llvm::Function &F) { visit(F); return false; } llvm::BasicBlock *getTrapBB(llvm::Instruction &Inst) { if (FailBB) return FailBB; llvm::Function *Fn = Inst.getParent()->getParent(); llvm::LLVMContext& ctx = Fn->getContext(); llvm::IRBuilder<> builder(ctx); FailBB = llvm::BasicBlock::Create(ctx, "FailBlock", Fn); llvm::ReturnInst::Create(Fn->getContext(), FailBB); return FailBB; } void visitLoadInst(llvm::LoadInst & LI) { } void visitStoreInst(llvm::StoreInst & SI) { llvm::Value * Addr = SI.getOperand(1); llvm::PointerType* PTy = llvm::cast(Addr->getType()); llvm::Type * ElTy = PTy -> getElementType(); if (!ElTy->isPointerTy()) { llvm::BasicBlock *OldBB = SI.getParent(); llvm::errs() << "yes, got it \n"; llvm::ICmpInst *Cmp = new llvm::ICmpInst(&SI, llvm::CmpInst::ICMP_EQ, Addr, llvm::Constant::getNullValue(Addr->getType()), ""); llvm::Instruction *Iter = &SI; OldBB->getParent()->dump(); llvm::BasicBlock *NewBB = OldBB->splitBasicBlock(Iter, "newBlock"); OldBB->getParent()->dump(); } } }; char Hello::ID = 0; static llvm::RegisterPass X("hello", "Hello Pass", false, false); } The test1.c example is as the following: #include void main() { int x; x = 5; } The IR for the example after adding the pass is as the following: define void @main() #0 { entry: %x = alloca i32, align 4 %0 = icmp eq i32* %x, null br label %newBlock newBlock: ; preds = %entry store i32 5, i32* %x, align 4 ret void } any suggestion? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Thu Jul 4 19:11:26 2013 From: tobias at grosser.es (Tobias Grosser) Date: Thu, 04 Jul 2013 19:11:26 -0700 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: References: Message-ID: <51D62B4E.9050406@grosser.es> On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. > > Any idea of what could be lacking? Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. Cheers, Tobias From etherzhhb at gmail.com Thu Jul 4 19:20:35 2013 From: etherzhhb at gmail.com (Hongbin Zheng) Date: Fri, 5 Jul 2013 10:20:35 +0800 Subject: [LLVMdev] [Polly] Assert in Scope construction In-Reply-To: <006101ce7813$cf856260$6e902720$@codeaurora.org> References: <006101ce7813$cf856260$6e902720$@codeaurora.org> Message-ID: Hi Sergei, On Thu, Jul 4, 2013 at 1:36 AM, Sergei Larin wrote: > Should have changed the subject line... > > --- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by > The Linux Foundation > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > On Behalf Of Sergei Larin > > Sent: Wednesday, July 03, 2013 12:29 PM > > To: 'Tobias Grosser' > > Cc: 'llvmdev' > > Subject: Re: [LLVMdev] [LNT] Question about results reliability in LNT > > infrustructure > > > > > > Tobias, > > > > I seem to trigger an assert in Polly lib/Analysis/TempScopInfo.cpp > > > > void TempScopInfo::buildAffineCondition(Value &V, bool inverted, > > Comparison **Comp) const { ... > > ICmpInst *ICmp = dyn_cast(&V); > > assert(ICmp && "Only ICmpInst of constant as condition supported!"); > ... > > > > The code it chokes on looks like this (see below). The problem is this > OR-ed > > compare result: > > > > %cmp3 = icmp sgt i32 %j.0, 2 > > %cmp5 = icmp eq i32 %j.0, 1 > > %or.cond13 = or i1 %cmp3, %cmp5 > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< == > > Value V > > > > > > My question - is this a bug or a (missing) feature? I think it is a bug. > ...and how it should > be > > handled in theory? > Such condition should be filtered out by ScopDetection (line 188 of ScopDetection.cpp). Thanks Hongbin > > > Thanks. > > > > Sergei > > > > define i32 @main() #0 { > > entry: > > %j.0.lcssa.reg2mem = alloca i32, align 8 > > br label %entry.split > > > > entry.split: ; preds = %entry > > %call = tail call i32 @foo(i32 0, i32 0) #2 > > %call1 = tail call i32 @foo(i32 %call, i32 %call) #2 > > br label %for.cond2 > > > > for.cond2: ; preds = %for.inc, > > %entry.split > > %j.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.inc ] > > %cmp3 = icmp sgt i32 %j.0, 2 > > %cmp5 = icmp eq i32 %j.0, 1 > > %or.cond13 = or i1 %cmp3, %cmp5 > > store i32 %j.0, i32* %j.0.lcssa.reg2mem, align 8 > > br i1 %or.cond13, label %for.end8, label %for.inc, > > !llvm.listen.preserve.while.opt !0 > > > > for.inc: ; preds = %for.cond2 > > %inc = add nsw i32 %j.0, 1 > > br label %for.cond2 > > > > for.end8: ; preds = %for.cond2 > > %j.0.lcssa.reload = load i32* %j.0.lcssa.reg2mem, align 8 > > %cmp10 = icmp eq i32 %j.0.lcssa.reload, 1 > > %add = add nsw i32 %j.0.lcssa.reload, 1 > > %retval.0 = select i1 %cmp10, i32 1, i32 %add > > ret i32 %retval.0 > > } > > > > > > --- > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > > hosted by The Linux Foundation > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Thu Jul 4 19:40:54 2013 From: tobias at grosser.es (Tobias Grosser) Date: Thu, 04 Jul 2013 19:40:54 -0700 Subject: [LLVMdev] [Polly] Assert in Scope construction In-Reply-To: References: <006101ce7813$cf856260$6e902720$@codeaurora.org> Message-ID: <51D63236.1020204@grosser.es> On 07/04/2013 07:20 PM, Hongbin Zheng wrote: > Hi Sergei, > > > On Thu, Jul 4, 2013 at 1:36 AM, Sergei Larin wrote: > >> Should have changed the subject line... >> >> --- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted >> by >> The Linux Foundation >> >> >>> -----Original Message----- >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >>> On Behalf Of Sergei Larin >>> Sent: Wednesday, July 03, 2013 12:29 PM >>> To: 'Tobias Grosser' >>> Cc: 'llvmdev' >>> Subject: Re: [LLVMdev] [LNT] Question about results reliability in LNT >>> infrustructure >>> >>> >>> Tobias, >>> >>> I seem to trigger an assert in Polly lib/Analysis/TempScopInfo.cpp >>> >>> void TempScopInfo::buildAffineCondition(Value &V, bool inverted, >>> Comparison **Comp) const { ... >>> ICmpInst *ICmp = dyn_cast(&V); >>> assert(ICmp && "Only ICmpInst of constant as condition supported!"); >> ... >>> >>> The code it chokes on looks like this (see below). The problem is this >> OR-ed >>> compare result: >>> >>> %cmp3 = icmp sgt i32 %j.0, 2 >>> %cmp5 = icmp eq i32 %j.0, 1 >>> %or.cond13 = or i1 %cmp3, %cmp5 >>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< == >>> Value V >>> >>> >>> My question - is this a bug or a (missing) feature? > > I think it is a bug. > > >> ...and how it should >> be >>> handled in theory? >> > Such condition should be filtered out by ScopDetection (line 188 of > ScopDetection.cpp). Hongbin is right. Running your test case gives: $ polly-opt -polly-detect -analyze /tmp/test.ll -debug-only=polly-detect Checking region: entry => Top level region is invalid Checking region: for.cond2 => for.end8 Condition in BB 'for.cond2' neither constant nor an icmp instruction Did you try this test case on the open source version of Polly? In case you want to support this code in the open source version of Polly, the way to go is pretty simple. You first enhance -polly-detect to allow boolean operations like 'and' and 'or'. You then change TempScop and ScopInfo to support those. The changes are pretty straightforward. You just need to keep track of a list of conditions and then and/or the conditions with isl when generating the iteration space. Cheers, Tobias From rafael.espindola at gmail.com Thu Jul 4 20:56:44 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Thu, 4 Jul 2013 23:56:44 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' Message-ID: We currently don't use pipefail when running test under make check. This has the undesirable property that it is really easy for tests to bitrot. For example, something like llc %s | FileCheck %s will still pass if llc crashes after printing what FileCheck was looking for. It is also easy to break the tests when refactoring. I have fixed tests that were doing %clang_cc1 -a-driver-options ... | not grep clearly the test was changed from %clang to %clang_cc1 and we missed the fact that the option also had to be updated. Currently to check a command output and that it doesn't crash we have to do llc %s > %t FileCheck %s < %t I would like to switch to using pipefail instead. That would meant that a simple llc %s | FileCheck %s would check both llc return value and output. I have already cleared all the tests, so all that is missing is changing lit itself. Any objections to doing so? Cheers, Rafael From swlin at post.harvard.edu Thu Jul 4 21:15:55 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 4 Jul 2013 21:15:55 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: Hi Rafael, I think you saw my other e-mail, but just in case you haven't, do you have any thoughts about making this an option that could be easily disabled on the command line without maintaing a patch to lit? I think it would help out-of-tree target maintainers to transition, since I'm sure there will be a lot of similarly broken tests to fix. Stephen On Thu, Jul 4, 2013 at 8:56 PM, Rafael Espíndola wrote: > We currently don't use pipefail when running test under make check. > This has the undesirable property that it is really easy for tests to > bitrot. For example, something like > > llc %s | FileCheck %s > > will still pass if llc crashes after printing what FileCheck was > looking for. It is also easy to break the tests when refactoring. I > have fixed tests that were doing > > %clang_cc1 -a-driver-options ... | not grep > > clearly the test was changed from %clang to %clang_cc1 and we missed > the fact that the option also had to be updated. > > Currently to check a command output and that it doesn't crash we have to do > > llc %s > %t > FileCheck %s < %t > > I would like to switch to using pipefail instead. That would meant that a simple > > llc %s | FileCheck %s > > would check both llc return value and output. I have already cleared > all the tests, so all that is missing is changing lit itself. Any > objections to doing so? > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From luoyonggang at gmail.com Thu Jul 4 21:22:13 2013 From: luoyonggang at gmail.com (=?UTF-8?B?572X5YuH5YiaKFlvbmdnYW5nIEx1bykg?=) Date: Fri, 5 Jul 2013 12:22:13 +0800 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> References: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> Message-ID: 在 2013-7-4 下午8:53,"Charles Davis" 写道: > > > On Jul 4, 2013, at 1:43 AM, 罗勇刚(Yonggang Luo) wrote: > > > LLVM is a modularized software system, so I hope it's was modularized, > > And ELF.h is definitely belongs to Object by classification, > > The things that confused me is the ELF.h was placed under Support and > > don't know why. > Because it's also used by the MC layer's direct object emission support. thanks for your response, did MC layer's direct object emission depends on Object library? > > Chip > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.espindola at gmail.com Thu Jul 4 21:40:34 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Fri, 5 Jul 2013 00:40:34 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On 5 July 2013 00:15, Stephen Lin wrote: > Hi Rafael, > > I think you saw my other e-mail, but just in case you haven't, do you > have any thoughts about making this an option that could be easily > disabled on the command line without maintaing a patch to lit? I think > it would help out-of-tree target maintainers to transition, since I'm > sure there will be a lot of similarly broken tests to fix. I don't think it is all that many since it was less than one day of work for the in tree ones. But if there is the desire for such an option I can try to add it. What should I use? An environment variable? > Stephen > Cheers, Rafael From swlin at post.harvard.edu Thu Jul 4 22:11:43 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 4 Jul 2013 22:11:43 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: > I don't think it is all that many since it was less than one day of > work for the in tree ones. But if there is the desire for such an > option I can try to add it. What should I use? An environment > variable? Hmm, I don't know LLVM's Makefile system well enough to know the easiest way to implement an option; if it's non-trivial then maybe it's not worth it. I also don't know the workflow of most people doing out-of-tree work, so I'm not sure how much impact this might have. It can obviously be temporarily reverted locally pretty easily, but it assumes people are paying attention to LLVMDev/llvm-commits and know what's going on. Also, the commit causing all the new failures might not be as obvious down the line to someone updating their tree irregularly (which is probably true for a lot of academic LLVM users, I'm guessing.) Just curious, does using pipefail give any information about where in the pipe the failure actually comes from? Some kind of message would be useful for debugging purposes, in addition to explaining what's going on to someone who wasn't watching dev lists and commit messages carefully. Anyway, perhaps none of this is as big of a deal as I'm making it out to be; I'll leave it to someone with more awareness of downstream workflows to comment further. > > Cheers, > Rafael Stephen From cdavis5x at gmail.com Thu Jul 4 22:21:46 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Thu, 4 Jul 2013 23:21:46 -0600 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: References: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> Message-ID: <466D0A86-6789-4BD7-9F83-2AF5C5B145A2@gmail.com> On Jul 4, 2013, at 10:22 PM, 罗勇刚(Yonggang Luo) wrote: > > 在 2013-7-4 下午8:53,"Charles Davis" 写道: > > > > > > On Jul 4, 2013, at 1:43 AM, 罗勇刚(Yonggang Luo) wrote: > > > > > LLVM is a modularized software system, so I hope it's was modularized, > > > And ELF.h is definitely belongs to Object by classification, > > > The things that confused me is the ELF.h was placed under Support and > > > don't know why. > > Because it's also used by the MC layer's direct object emission support. > thanks for your response, did MC layer's direct object emission depends on Object library? > Nope. The Object library only reads object files. MC, on the other hand, only writes object files. Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From dacian_herbei at yahoo.fr Thu Jul 4 23:02:08 2013 From: dacian_herbei at yahoo.fr (Herbei Dacian) Date: Fri, 5 Jul 2013 07:02:08 +0100 (BST) Subject: [LLVMdev] emulator Message-ID: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> Hi All, I'm a real newbie to this so I have a few simple question. I would like to make an llvm byte code emulator with certain special features. So, I need an llvm byte code emulator that works out of the box if possible. Does this exist? Is it open source? And if not what is the closest open source to it. best regards, dacian -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Thu Jul 4 23:43:22 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Fri, 5 Jul 2013 07:43:22 +0100 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: Message-ID: Hi Robert, > This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. I've thought that once or twice when dealing with ABIs myself. That's certainly another possibility in your case. You could create a separate FunctionPass that gets executed early on and replaces all byval calls and functions with the correct memcpys. It wouldn't work for other targets because they need more control over just where the copy ends up, but it sounds like you shouldn't have an issue. > So, can it be done in the llvm? Yes, in multiple ways. > Should it be done in the llvm? I think so, one way or the other. Tim. From alexandruionutdiaconescu at gmail.com Thu Jul 4 23:48:45 2013 From: alexandruionutdiaconescu at gmail.com (Alexandru Ionut Diaconescu) Date: Fri, 5 Jul 2013 08:48:45 +0200 Subject: [LLVMdev] emulator In-Reply-To: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> References: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> Message-ID: Hi Dacian, What want to make a "llvm byte code emulator", so basically you want to learn how to use LLVM or how to make a compiler to obtain LLVM bytecode? On Fri, Jul 5, 2013 at 8:02 AM, Herbei Dacian wrote: > > Hi All, > I'm a real newbie to this so I have a few simple question. > I would like to make an llvm byte code emulator with certain special > features. > So, I need an llvm byte code emulator that works out of the box if > possible. > Does this exist? Is it open source? > And if not what is the closest open source to it. > best regards, > dacian > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- Best regards, Alexandru Ionut Diaconescu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dacian_herbei at yahoo.fr Thu Jul 4 23:58:40 2013 From: dacian_herbei at yahoo.fr (Herbei Dacian) Date: Fri, 5 Jul 2013 07:58:40 +0100 (BST) Subject: [LLVMdev] emulator In-Reply-To: References: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> Message-ID: <1373007520.16909.YahooMailNeo@web172605.mail.ir2.yahoo.com> Hi Alexandru, that is not what I wish to do. I want to use the infrastructure that is already available to produce the byte code. Then this byte code will be run by a virtual machine in a special way. That is why I need the code from a virtual machine that I can modify and insert my special handling. I know that there is an llva-emu project but I can't get the sources from anywhere. regards, dacian ________________________________ From: Alexandru Ionut Diaconescu To: Herbei Dacian Cc: "llvmdev at cs.uiuc.edu" Sent: Friday, 5 July 2013, 8:48 Subject: Re: [LLVMdev] emulator Hi Dacian, What want to make a "llvm byte code emulator", so basically you want to learn how to use LLVM or how to make a compiler to obtain LLVM bytecode? On Fri, Jul 5, 2013 at 8:02 AM, Herbei Dacian wrote: >Hi All, >I'm a real newbie to this so I have a few simple question. >I would like to make an llvm byte code emulator with certain special features. >So, I need an llvm byte code emulator that works out of the box if possible. >Does this exist? Is it open source? >And if not what is the closest open source to it. >best regards, >dacian > > > >_______________________________________________ >LLVM Developers mailing list >LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu >http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- Best regards, Alexandru Ionut Diaconescu -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Fri Jul 5 00:05:08 2013 From: baldrick at free.fr (Duncan Sands) Date: Fri, 05 Jul 2013 09:05:08 +0200 Subject: [LLVMdev] emulator In-Reply-To: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> References: <1373004128.33099.YahooMailNeo@web172604.mail.ir2.yahoo.com> Message-ID: <51D67024.8080504@free.fr> Hi Dacian, On 05/07/13 08:02, Herbei Dacian wrote: > > Hi All, > I'm a real newbie to this so I have a few simple question. > I would like to make an llvm byte code emulator with certain special features. > So, I need an llvm byte code emulator that works out of the box if possible. > Does this exist? Is it open source? > And if not what is the closest open source to it. > best regards, > dacian LLVM has a byte code interpreter, lli (run with -force-interpreter to get the interpreter, otherwise it will JIT the byte code). Ciao, Duncan. From robert at xmos.com Fri Jul 5 00:52:22 2013 From: robert at xmos.com (Robert Lytton) Date: Fri, 5 Jul 2013 07:52:22 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: , Message-ID: Hi Tim, Thought about it last night and was coming to the same conclusion. 1. it cant be done at the end during lowering (target backend). 2. it should be part of llvm as the byVal needs to be handled. As a twist, I have been told that llvm-gcc can lower byVal into memcpy in the callee. I may take a look at this. I wonder if it ever emits 'byVal'... I still feel I don't understand enough about where byVal is used or what it means. Is it *only* used as an attribute of an argument pointer to argument data that is pending a copy? Once the memcpy is made, I assume the byVal is removed viz the arg pointer is replaced with a new arg pointer to the copied data. Thus, must *all* byVal attributes be replaced in the IR? I need to do more reading about other attributes and get more familiar with the IR in general... robert ________________________________________ From: Tim Northover [t.p.northover at gmail.com] Sent: 05 July 2013 07:43 To: Robert Lytton Cc: Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi Robert, > This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. I've thought that once or twice when dealing with ABIs myself. That's certainly another possibility in your case. You could create a separate FunctionPass that gets executed early on and replaces all byval calls and functions with the correct memcpys. It wouldn't work for other targets because they need more control over just where the copy ends up, but it sounds like you shouldn't have an issue. > So, can it be done in the llvm? Yes, in multiple ways. > Should it be done in the llvm? I think so, one way or the other. Tim. From sebastien.deldon at st.com Fri Jul 5 02:08:58 2013 From: sebastien.deldon at st.com (Sebastien DELDON-GNB) Date: Fri, 5 Jul 2013 11:08:58 +0200 Subject: [LLVMdev] Is there a way to check that debug metadata are well formed ? Message-ID: <17F9E444F61B644FAAA6EA20EE53E4DBCF4A4C2058@SAFEX1MAIL2.st.com> Hi all, Is there an easy way to check that debug metadata in a .ll file are well formed ? Thanks for you answers Seb From baldrick at free.fr Fri Jul 5 02:19:50 2013 From: baldrick at free.fr (Duncan Sands) Date: Fri, 05 Jul 2013 11:19:50 +0200 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: , Message-ID: <51D68FB6.2020205@free.fr> Hi Robert, suppose you have a "byval" argument with type T*, and the caller passes a T* called %X for it, while in the callee the argument is called %Y. The IR level semantics are: (1) a copy should be made of *%X. Whether the callee or the caller makes the copy depends on the platform ABI. (2) in the callee, %Y refers to the address of this copy. There are many ways (1) can be codegened, it all depends on what the platform ABI says. Examples: - the caller allocates memory on the stack, copies *%X to it, then passes a pointer to the stack memory to the callee as %Y. - the caller passes %X to the callee, the callee allocates memory on the stack, copies *%X to it, then places the address of the copy in %Y - the caller loads *%X into a bunch of registers and passes the registers to the callee. The callee allocates memory on the stack, writes the contents of the registers to it (thus reconstructing *%X), and places the address of the memory in %Y. Which method is used should be specified by the platform ABI. For example, what does GCC do? Ciao, Duncan. On 05/07/13 09:52, Robert Lytton wrote: > Hi Tim, > > Thought about it last night and was coming to the same conclusion. > 1. it cant be done at the end during lowering (target backend). > 2. it should be part of llvm as the byVal needs to be handled. > > As a twist, I have been told that llvm-gcc can lower byVal into memcpy in the callee. > I may take a look at this. > I wonder if it ever emits 'byVal'... > > I still feel I don't understand enough about where byVal is used or what it means. > Is it *only* used as an attribute of an argument pointer to argument data that is pending a copy? > Once the memcpy is made, I assume the byVal is removed viz the arg pointer is replaced with a new arg pointer to the copied data. > Thus, must *all* byVal attributes be replaced in the IR? > I need to do more reading about other attributes and get more familiar with the IR in general... > > robert > > ________________________________________ > From: Tim Northover [t.p.northover at gmail.com] > Sent: 05 July 2013 07:43 > To: Robert Lytton > Cc: > Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame > > Hi Robert, > >> This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. > > I've thought that once or twice when dealing with ABIs myself. That's > certainly another possibility in your case. You could create a > separate FunctionPass that gets executed early on and replaces all > byval calls and functions with the correct memcpys. > > It wouldn't work for other targets because they need more control over > just where the copy ends up, but it sounds like you shouldn't have an > issue. > >> So, can it be done in the llvm? > > Yes, in multiple ways. > >> Should it be done in the llvm? > > I think so, one way or the other. > > Tim. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From robert at xmos.com Fri Jul 5 03:00:02 2013 From: robert at xmos.com (Robert Lytton) Date: Fri, 5 Jul 2013 10:00:02 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: , , Message-ID: Hi Tim, Correction to my last email. What I should have said is that the new pointer is used by the callee rather than the original byVal pointer arg. (the byVal pointer arg remains but is not used by the callee). viz: define void @f1(%struct.tag* byval) { entry: %st = alloca %struct.tag, align 4 %1 = bitcast %struct.tag* %st to i8* %2 = bitcast %struct.tag* %0 to i8* call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %2, i32 88, i32 4, i1 false) ; from now on %0 is not used ; callee uses the copy %st instead Also, LowerFormalArguments() is not that late! I just need to understand the process better :-/ As an aside, the Lang Ref states "The copy is considered to belong to the caller not the callee". I guess this has to do with permission rather than location in memory or in time the copy happens. Hence the copy can be made by the callee onto the callee's frame on behalf of the caller! robert ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Robert Lytton [robert at xmos.com] Sent: 05 July 2013 08:52 To: Tim Northover Cc: Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi Tim, Thought about it last night and was coming to the same conclusion. 1. it cant be done at the end during lowering (target backend). 2. it should be part of llvm as the byVal needs to be handled. As a twist, I have been told that llvm-gcc can lower byVal into memcpy in the callee. I may take a look at this. I wonder if it ever emits 'byVal'... I still feel I don't understand enough about where byVal is used or what it means. Is it *only* used as an attribute of an argument pointer to argument data that is pending a copy? Once the memcpy is made, I assume the byVal is removed viz the arg pointer is replaced with a new arg pointer to the copied data. Thus, must *all* byVal attributes be replaced in the IR? I need to do more reading about other attributes and get more familiar with the IR in general... robert ________________________________________ From: Tim Northover [t.p.northover at gmail.com] Sent: 05 July 2013 07:43 To: Robert Lytton Cc: Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi Robert, > This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. I've thought that once or twice when dealing with ABIs myself. That's certainly another possibility in your case. You could create a separate FunctionPass that gets executed early on and replaces all byval calls and functions with the correct memcpys. It wouldn't work for other targets because they need more control over just where the copy ends up, but it sounds like you shouldn't have an issue. > So, can it be done in the llvm? Yes, in multiple ways. > Should it be done in the llvm? I think so, one way or the other. Tim. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From t.p.northover at gmail.com Fri Jul 5 03:45:57 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Fri, 5 Jul 2013 11:45:57 +0100 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: References: Message-ID: > What I should have said is that the new pointer is used by the callee rather than the original byVal pointer arg. Probably, I thought it looked a bit odd. Your code sequence looks reasonable (apart from byval still being present in the argument list; as you said before, lowering should probably remove that). > Also, LowerFormalArguments() is not that late! It's part of the LLVM IR to DAG conversion and target-specific. In LLVM terms that is late. All of the opt passes have run by this point, and they are the ones likely to make use of the extra freedom provided. Cheers. Tim. From benoit.noe.perrot at gmail.com Fri Jul 5 04:10:40 2013 From: benoit.noe.perrot at gmail.com (Benoit Perrot) Date: Fri, 5 Jul 2013 13:10:40 +0200 Subject: [LLVMdev] [cfe-dev] llvm (hence Clang) not compiling with Visual Studio 2008 In-Reply-To: References: Message-ID: Hello Ahmed, On Fri, Jul 5, 2013 at 1:43 AM, Ahmed Bougacha wrote: > On Thu, Jul 4, 2013 at 12:48 AM, Benoit Perrot < > benoit.noe.perrot at gmail.com> wrote: > >> >> > I have just updated my svn copy of the llvm/clang repositories after quite >> a long time of inactivity, and found it not compiling on Windows with >> Visual Studio 2008. >> [...] >> Solutions here: >> >> 1. consider that the implementation if effectively wrong, and modify the >> documentation at http://llvm.org/docs/GettingStartedVS.html, requiring >> Visual Studio 2010, i.e. replacing: >> 2. investigate whether there exists a way to disable the aforementioned >> check; >> 3. modify the code in MCModule.cpp to cope with the implementation of >> "lower_bound" in VS 2008. >> >> Personally I just went for (1), i.e. switching to Visual Studio 2010, as >> it was the most straightforward. >> > > (3) Fixed in r185676. > > Requiring VS 2010 for a minor problem like this (even though there are > more like it) isn’t warranted I think. > Great! > Doing so, I also had to add "#include " to the file >> "lib/CodeGen/CGBlocks.cpp" so that llvm/clang can compile with said >> compiler, because of some obscure external template usage. >> > > is already included, at least by StringRef.h, so I’m curious: > what is this obscure thing that needs including it again? > > My build was in an intermediate invalid state. Reverting said modification, cleaning completely then building again just worked fine. Sorry for the false alarm. Thanks! -- Benoit PERROT -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert at xmos.com Fri Jul 5 04:37:51 2013 From: robert at xmos.com (Robert Lytton) Date: Fri, 5 Jul 2013 11:37:51 +0000 Subject: [LLVMdev] making a copy of a byval aggregate on the callee's frame In-Reply-To: <51D68FB6.2020205@free.fr> References: , , <51D68FB6.2020205@free.fr> Message-ID: Hi Duncan, Thank you, that is helpful. I need the 2nd example for doing (1) I now have a better understanding of the SelectionDAG and LowerFormalArguments() so will see how far I get. (so much to know, so many ways to get lost) Robert ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf of Duncan Sands [baldrick at free.fr] Sent: 05 July 2013 10:19 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame Hi Robert, suppose you have a "byval" argument with type T*, and the caller passes a T* called %X for it, while in the callee the argument is called %Y. The IR level semantics are: (1) a copy should be made of *%X. Whether the callee or the caller makes the copy depends on the platform ABI. (2) in the callee, %Y refers to the address of this copy. There are many ways (1) can be codegened, it all depends on what the platform ABI says. Examples: - the caller allocates memory on the stack, copies *%X to it, then passes a pointer to the stack memory to the callee as %Y. - the caller passes %X to the callee, the callee allocates memory on the stack, copies *%X to it, then places the address of the copy in %Y - the caller loads *%X into a bunch of registers and passes the registers to the callee. The callee allocates memory on the stack, writes the contents of the registers to it (thus reconstructing *%X), and places the address of the memory in %Y. Which method is used should be specified by the platform ABI. For example, what does GCC do? Ciao, Duncan. On 05/07/13 09:52, Robert Lytton wrote: > Hi Tim, > > Thought about it last night and was coming to the same conclusion. > 1. it cant be done at the end during lowering (target backend). > 2. it should be part of llvm as the byVal needs to be handled. > > As a twist, I have been told that llvm-gcc can lower byVal into memcpy in the callee. > I may take a look at this. > I wonder if it ever emits 'byVal'... > > I still feel I don't understand enough about where byVal is used or what it means. > Is it *only* used as an attribute of an argument pointer to argument data that is pending a copy? > Once the memcpy is made, I assume the byVal is removed viz the arg pointer is replaced with a new arg pointer to the copied data. > Thus, must *all* byVal attributes be replaced in the IR? > I need to do more reading about other attributes and get more familiar with the IR in general... > > robert > > ________________________________________ > From: Tim Northover [t.p.northover at gmail.com] > Sent: 05 July 2013 07:43 > To: Robert Lytton > Cc: > Subject: Re: [LLVMdev] making a copy of a byval aggregate on the callee's frame > > Hi Robert, > >> This should ideally be done early on in the IR in my thinking - to allow optimisation if the data is only ever read. > > I've thought that once or twice when dealing with ABIs myself. That's > certainly another possibility in your case. You could create a > separate FunctionPass that gets executed early on and replaces all > byval calls and functions with the correct memcpys. > > It wouldn't work for other targets because they need more control over > just where the copy ends up, but it sounds like you shouldn't have an > issue. > >> So, can it be done in the llvm? > > Yes, in multiple ways. > >> Should it be done in the llvm? > > I think so, one way or the other. > > Tim. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From letz at grame.fr Fri Jul 5 05:37:38 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Fri, 5 Jul 2013 14:37:38 +0200 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: <51D62B4E.9050406@grosser.es> References: <51D62B4E.9050406@grosser.es> Message-ID: <099B9C2B-8E3C-40C0-91E3-CEE78514148D@grame.fr> Le 5 juil. 2013 à 04:11, Tobias Grosser a écrit : > On 07/04/2013 01:39 PM, Stéphane Letz wrote: >> Hi, >> >> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >> >> Any idea of what could be lacking? > > Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. > > If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. > > Cheers, > Tobias > Hi Tobias, 1) Here is a simple C loop generated by our C backend: void computemydsp(mydsp* dsp, int count, float** inputs, float** outputs) { float* input0 = inputs[0]; float* input1 = inputs[1]; float* output0 = outputs[0]; /* C99 loop */ { int i; for (i = 0; (i < count); i = (i + 1)) { output0[i] = (float)((float)input0[i] + (float)input1[i]); } } } 2) Compiling it with "clang -O3" vectorize it directly: define void @computemydsp(%struct.mydsp* nocapture %dsp, i32 %count, float** nocapture %inputs, float** nocapture %outputs) #0 { entry: %0 = load float** %inputs, align 8, !tbaa !3 %arrayidx1 = getelementptr inbounds float** %inputs, i64 1 %1 = load float** %arrayidx1, align 8, !tbaa !3 %2 = load float** %outputs, align 8, !tbaa !3 %cmp14 = icmp sgt i32 %count, 0 br i1 %cmp14, label %for.body.lr.ph, label %for.end for.body.lr.ph: ; preds = %entry %cnt.cast = zext i32 %count to i64 %n.vec = and i64 %cnt.cast, 4294967288 %cmp.zero = icmp eq i64 %n.vec, 0 %3 = add i32 %count, -1 %4 = zext i32 %3 to i64 %scevgep = getelementptr float* %2, i64 %4 br i1 %cmp.zero, label %middle.block, label %vector.memcheck vector.memcheck: ; preds = %for.body.lr.ph %scevgep19 = getelementptr float* %1, i64 %4 %scevgep17 = getelementptr float* %0, i64 %4 %bound122 = icmp ule float* %1, %scevgep %bound021 = icmp ule float* %2, %scevgep19 %bound1 = icmp ule float* %0, %scevgep %bound0 = icmp ule float* %2, %scevgep17 %found.conflict23 = and i1 %bound021, %bound122 %found.conflict = and i1 %bound0, %bound1 %conflict.rdx = or i1 %found.conflict, %found.conflict23 br i1 %conflict.rdx, label %middle.block, label %vector.body vector.body: ; preds = %vector.memcheck, %vector.body %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.memcheck ] %5 = getelementptr inbounds float* %0, i64 %index %6 = bitcast float* %5 to <4 x float>* %wide.load = load <4 x float>* %6, align 4 %.sum32 = or i64 %index, 4 %7 = getelementptr float* %0, i64 %.sum32 %8 = bitcast float* %7 to <4 x float>* %wide.load25 = load <4 x float>* %8, align 4 %9 = getelementptr inbounds float* %1, i64 %index %10 = bitcast float* %9 to <4 x float>* %wide.load26 = load <4 x float>* %10, align 4 %.sum33 = or i64 %index, 4 %11 = getelementptr float* %1, i64 %.sum33 %12 = bitcast float* %11 to <4 x float>* %wide.load27 = load <4 x float>* %12, align 4 %13 = fadd <4 x float> %wide.load, %wide.load26 %14 = fadd <4 x float> %wide.load25, %wide.load27 %15 = getelementptr inbounds float* %2, i64 %index %16 = bitcast float* %15 to <4 x float>* store <4 x float> %13, <4 x float>* %16, align 4 %.sum34 = or i64 %index, 4 %17 = getelementptr float* %2, i64 %.sum34 %18 = bitcast float* %17 to <4 x float>* store <4 x float> %14, <4 x float>* %18, align 4 %index.next = add i64 %index, 8 %19 = icmp eq i64 %index.next, %n.vec br i1 %19, label %middle.block, label %vector.body middle.block: ; preds = %vector.body, %vector.memcheck, %for.body.lr.ph %resume.val = phi i64 [ 0, %for.body.lr.ph ], [ 0, %vector.memcheck ], [ %n.vec, %vector.body ] %cmp.n = icmp eq i64 %cnt.cast, %resume.val br i1 %cmp.n, label %for.end, label %for.body for.body: ; preds = %middle.block, %for.body %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %resume.val, %middle.block ] %arrayidx3 = getelementptr inbounds float* %0, i64 %indvars.iv %20 = load float* %arrayidx3, align 4, !tbaa !4 %arrayidx5 = getelementptr inbounds float* %1, i64 %indvars.iv %21 = load float* %arrayidx5, align 4, !tbaa !4 %add = fadd float %20, %21 %arrayidx7 = getelementptr inbounds float* %2, i64 %indvars.iv store float %add, float* %arrayidx7, align 4, !tbaa !4 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %count br i1 %exitcond, label %for.end, label %for.body, !llvm.vectorizer.already_vectorized !5 for.end: ; preds = %middle.block, %for.body, %entry ret void } ; Function Attrs: nounwind ssp uwtable define i32 @main(i32 %argc, i8** nocapture %argv) #0 { entry: ret i32 0 } 3) compiling it with "clang -O1" ; Function Attrs: nounwind ssp uwtable define void @computemydsp(%struct.mydsp* nocapture %dsp, i32 %count, float** nocapture %inputs, float** nocapture %outputs) #0 { entry: %0 = load float** %inputs, align 8, !tbaa !3 %arrayidx1 = getelementptr inbounds float** %inputs, i64 1 %1 = load float** %arrayidx1, align 8, !tbaa !3 %2 = load float** %outputs, align 8, !tbaa !3 %cmp14 = icmp sgt i32 %count, 0 br i1 %cmp14, label %for.body, label %for.end for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ] %arrayidx3 = getelementptr inbounds float* %0, i64 %indvars.iv %3 = load float* %arrayidx3, align 4, !tbaa !4 %arrayidx5 = getelementptr inbounds float* %1, i64 %indvars.iv %4 = load float* %arrayidx5, align 4, !tbaa !4 %add = fadd float %3, %4 %arrayidx7 = getelementptr inbounds float* %2, i64 %indvars.iv store float %add, float* %arrayidx7, align 4, !tbaa !4 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %count br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret void } 4) then using "opt -o3 -vectorize-loops" vectorize it: ; Function Attrs: nounwind ssp uwtable define void @computemydsp(%struct.mydsp* nocapture %dsp, i32 %count, float** nocapture %inputs, float** nocapture %outputs) #0 { entry: %0 = load float** %inputs, align 8, !tbaa !3 %arrayidx1 = getelementptr inbounds float** %inputs, i64 1 %1 = load float** %arrayidx1, align 8, !tbaa !3 %2 = load float** %outputs, align 8, !tbaa !3 %cmp14 = icmp sgt i32 %count, 0 br i1 %cmp14, label %for.body.preheader, label %for.end for.body.preheader: ; preds = %entry %cnt.cast = zext i32 %count to i64 %3 = urem i32 %count, 24 %n.mod.vf = zext i32 %3 to i64 %n.vec = sub i64 %cnt.cast, %n.mod.vf %cmp.zero = icmp eq i32 %3, %count %4 = add i32 %count, -1 %5 = zext i32 %4 to i64 %scevgep = getelementptr float* %2, i64 %5 br i1 %cmp.zero, label %middle.block, label %vector.memcheck vector.memcheck: ; preds = %for.body.preheader %scevgep6 = getelementptr float* %1, i64 %5 %scevgep4 = getelementptr float* %0, i64 %5 %bound19 = icmp ule float* %1, %scevgep %bound08 = icmp ule float* %2, %scevgep6 %bound1 = icmp ule float* %0, %scevgep %bound0 = icmp ule float* %2, %scevgep4 %found.conflict10 = and i1 %bound08, %bound19 %found.conflict = and i1 %bound0, %bound1 %conflict.rdx = or i1 %found.conflict, %found.conflict10 br i1 %conflict.rdx, label %middle.block, label %vector.body vector.body: ; preds = %vector.memcheck, %vector.body %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.memcheck ] %6 = getelementptr inbounds float* %0, i64 %index %7 = bitcast float* %6 to <8 x float>* %wide.load = load <8 x float>* %7, align 4 %.sum = add i64 %index, 8 %8 = getelementptr float* %0, i64 %.sum %9 = bitcast float* %8 to <8 x float>* %wide.load13 = load <8 x float>* %9, align 4 %.sum23 = add i64 %index, 16 %10 = getelementptr float* %0, i64 %.sum23 %11 = bitcast float* %10 to <8 x float>* %wide.load14 = load <8 x float>* %11, align 4 %12 = getelementptr inbounds float* %1, i64 %index %13 = bitcast float* %12 to <8 x float>* %wide.load15 = load <8 x float>* %13, align 4 %.sum24 = add i64 %index, 8 %14 = getelementptr float* %1, i64 %.sum24 %15 = bitcast float* %14 to <8 x float>* %wide.load16 = load <8 x float>* %15, align 4 %.sum25 = add i64 %index, 16 %16 = getelementptr float* %1, i64 %.sum25 %17 = bitcast float* %16 to <8 x float>* %wide.load17 = load <8 x float>* %17, align 4 %18 = fadd <8 x float> %wide.load, %wide.load15 %19 = fadd <8 x float> %wide.load13, %wide.load16 %20 = fadd <8 x float> %wide.load14, %wide.load17 %21 = getelementptr inbounds float* %2, i64 %index %22 = bitcast float* %21 to <8 x float>* store <8 x float> %18, <8 x float>* %22, align 4 %.sum26 = add i64 %index, 8 %23 = getelementptr float* %2, i64 %.sum26 %24 = bitcast float* %23 to <8 x float>* store <8 x float> %19, <8 x float>* %24, align 4 %.sum27 = add i64 %index, 16 %25 = getelementptr float* %2, i64 %.sum27 %26 = bitcast float* %25 to <8 x float>* store <8 x float> %20, <8 x float>* %26, align 4 %index.next = add i64 %index, 24 %27 = icmp eq i64 %index.next, %n.vec br i1 %27, label %middle.block, label %vector.body middle.block: ; preds = %vector.body, %vector.memcheck, %for.body.preheader %resume.val = phi i64 [ 0, %for.body.preheader ], [ 0, %vector.memcheck ], [ %n.vec, %vector.body ] %cmp.n = icmp eq i64 %cnt.cast, %resume.val br i1 %cmp.n, label %for.end, label %for.body for.body: ; preds = %middle.block, %for.body %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %resume.val, %middle.block ] %arrayidx3 = getelementptr inbounds float* %0, i64 %indvars.iv %28 = load float* %arrayidx3, align 4, !tbaa !4 %arrayidx5 = getelementptr inbounds float* %1, i64 %indvars.iv %29 = load float* %arrayidx5, align 4, !tbaa !4 %add = fadd float %28, %29 %arrayidx7 = getelementptr inbounds float* %2, i64 %indvars.iv store float %add, float* %arrayidx7, align 4, !tbaa !4 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv1 = trunc i64 %indvars.iv.next to i32 %exitcond2 = icmp eq i32 %lftr.wideiv1, %count br i1 %exitcond2, label %for.end, label %for.body, !llvm.vectorizer.already_vectorized !5 for.end: ; preds = %middle.block, %for.body, %entry ret void } 5) producing LLVM IR with our LLVM backend : define void @compute_mydsp(%struct.dsp_mydsp* %dsp, i32 %fullcount, float** noalias %inputs, float** noalias %outputs) { block_code: br label %code_block code_block: ; preds = %block_code %0 = getelementptr inbounds float** %inputs, i32 0 %1 = load float** %0 %2 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 0 store float* %1, float** %2 %fInput0 = alloca float* %3 = getelementptr inbounds float** %inputs, i32 1 %4 = load float** %3 %5 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 1 store float* %4, float** %5 %fInput1 = alloca float* %6 = getelementptr inbounds float** %outputs, i32 0 %7 = load float** %6 %8 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 2 store float* %7, float** %8 %fOutput0 = alloca float* br label %init_block init_block: ; preds = %code_block %index = alloca i32 store i32 0, i32* %index br label %exec_block exec_block: ; preds = %exit_block6, %init_block %index1 = phi i32 [ 0, %init_block ], [ %next_index9, %exit_block6 ] %9 = load i32* %index %10 = icmp slt i32 %9, %fullcount %11 = select i1 %10, i32 1, i32 0 %12 = trunc i32 %11 to i1 br i1 %12, label %loop_body_block, label %exit_block loop_body_block: ; preds = %exec_block br label %code_block2 exit_block: ; preds = %exec_block br label %return code_block2: ; preds = %loop_body_block %13 = load i32* %index %14 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 0 %15 = load float** %14 %16 = getelementptr inbounds float* %15, i32 %13 store float* %16, float** %fInput0 %17 = load i32* %index %18 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 1 %19 = load float** %18 %20 = getelementptr inbounds float* %19, i32 %17 store float* %20, float** %fInput1 %21 = load i32* %index %22 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 2 %23 = load float** %22 %24 = getelementptr inbounds float* %23, i32 %21 store float* %24, float** %fOutput0 %count = alloca i32 %25 = load i32* %index %26 = sub i32 %fullcount, %25 %27 = icmp slt i32 32, %26 %28 = select i1 %27, i32 32, i32 %26 store i32 %28, i32* %count br label %init_block3 init_block3: ; preds = %code_block2 %i = alloca i32 store i32 0, i32* %i br label %exec_block4 exec_block4: ; preds = %code_block8, %init_block3 %i7 = phi i32 [ 0, %init_block3 ], [ %next_index, %code_block8 ] %29 = load i32* %i %30 = load i32* %count %31 = icmp slt i32 %29, %30 %32 = select i1 %31, i32 1, i32 0 %33 = trunc i32 %32 to i1 br i1 %33, label %loop_body_block5, label %exit_block6 loop_body_block5: ; preds = %exec_block4 br label %code_block8 exit_block6: ; preds = %exec_block4 %34 = load i32* %index %next_index9 = add i32 %34, 32 store i32 %next_index9, i32* %index br label %exec_block code_block8: ; preds = %loop_body_block5 %35 = load i32* %i %36 = load float** %fOutput0 %37 = getelementptr inbounds float* %36, i32 %35 %38 = load i32* %i %39 = load float** %fInput0 %40 = getelementptr inbounds float* %39, i32 %38 %41 = load float* %40 %42 = load i32* %i %43 = load float** %fInput1 %44 = getelementptr inbounds float* %43, i32 %42 %45 = load float* %44 %46 = fadd float %41, %45 store float %46, float* %37 %47 = load i32* %i %next_index = add i32 %47, 1 store i32 %next_index, i32* %i br label %exec_block4 return: ; preds = %exit_block ret void } 6) Then using "opt -o3 -vectorize-loops" *does not* vectorize it: ; Function Attrs: nounwind define void @compute_mydsp(%struct.dsp_mydsp* nocapture %dsp, i32 %fullcount, float** noalias nocapture %inputs, float** noalias nocapture %outputs) #0 { block_code: %0 = load float** %inputs %1 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 0 store float* %0, float** %1 %2 = getelementptr inbounds float** %inputs, i32 1 %3 = load float** %2 %4 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 1 store float* %3, float** %4 %5 = load float** %outputs %6 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i32 0, i32 2 store float* %5, float** %6 %7 = icmp sgt i32 %fullcount, 0 br i1 %7, label %code_block2.lr.ph, label %return code_block2.lr.ph: ; preds = %block_code %8 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 0 %9 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 1 %10 = getelementptr inbounds %struct.dsp_mydsp* %dsp, i64 0, i32 2 br label %code_block2 code_block2: ; preds = %exit_block6, %code_block2.lr.ph %next_index95 = phi i32 [ 0, %code_block2.lr.ph ], [ %next_index9, %exit_block6 ] %11 = load float** %8 %12 = load float** %9 %13 = load float** %10 %14 = sub i32 %fullcount, %next_index95 %15 = icmp sgt i32 %14, 32 %16 = select i1 %15, i32 32, i32 %14 %17 = icmp sgt i32 %16, 0 br i1 %17, label %code_block8, label %exit_block6 exit_block6: ; preds = %code_block8, %code_block2 %next_index9 = add i32 %next_index95, 32 %18 = icmp slt i32 %next_index9, %fullcount br i1 %18, label %code_block2, label %return code_block8: ; preds = %code_block2, %code_block8 %next_index3 = phi i32 [ %next_index, %code_block8 ], [ 0, %code_block2 ] %.sum = add i32 %next_index95, %next_index3 %19 = getelementptr inbounds float* %13, i32 %.sum %.sum8 = add i32 %next_index95, %next_index3 %20 = getelementptr inbounds float* %11, i32 %.sum8 %21 = load float* %20 %22 = getelementptr inbounds float* %12, i32 %.sum %23 = load float* %22 %24 = fadd float %21, %23 store float %24, float* %19 %next_index = add i32 %next_index3, 1 %25 = icmp slt i32 %next_index, %16 br i1 %25, label %code_block8, label %exit_block6 return: ; preds = %exit_block6, %block_code ret void } Any idea what is wrong then? Thanks Stéphane Letz From baldrick at free.fr Fri Jul 5 05:47:54 2013 From: baldrick at free.fr (Duncan Sands) Date: Fri, 05 Jul 2013 14:47:54 +0200 Subject: [LLVMdev] Is there a way to check that debug metadata are well formed ? In-Reply-To: <17F9E444F61B644FAAA6EA20EE53E4DBCF4A4C2058@SAFEX1MAIL2.st.com> References: <17F9E444F61B644FAAA6EA20EE53E4DBCF4A4C2058@SAFEX1MAIL2.st.com> Message-ID: <51D6C07A.5090109@free.fr> Hi Seb, On 05/07/13 11:08, Sebastien DELDON-GNB wrote: > Hi all, > > Is there an easy way to check that debug metadata in a .ll file are well formed ? > > Thanks for you answers I don't think so. It would be great if the verifier checked debug and other standard meta data. Ciao, Duncan. From vnorilo at siba.fi Fri Jul 5 05:59:36 2013 From: vnorilo at siba.fi (Vesa Norilo) Date: Fri, 05 Jul 2013 15:59:36 +0300 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: References: Message-ID: Hi, For libfaust, perhaps? :) Could it be something as simple as a target triple defined for the module? Without knowledge of the target machine vector width, the vectorizer will assume maximum width of 1. You can override this without a triple by using the -force-vector-width switch. Vesa > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we > can vectorize the C produced code using clang with -O3, or clang with > -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM > IR version cannot be vectorized with opt -O3 -vectorize-loops. So our > guess is that our generated LLVM IR lacks some informations that are > needed by the vectorization passes to correctly work. > > Any idea of what could be lacking? > > Thanks > > St?phane Letz From baldrick at free.fr Fri Jul 5 06:03:49 2013 From: baldrick at free.fr (Duncan Sands) Date: Fri, 05 Jul 2013 15:03:49 +0200 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: References: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> Message-ID: <51D6C435.2050400@free.fr> Hi Stephen, On 04/07/13 20:21, Stephen Lin wrote: > On Thu, Jul 4, 2013 at 1:45 AM, David Chisnall > wrote: >> On 3 Jul 2013, at 23:05, Stephen Lin wrote: >> >>> Does anyone know if there's a defined policy about this, either way? >>> If so, I think it ought to be noted in the docs, for consistency. >> >> The prohibition exists, at least in part, because in theory it would be nice to be able to run passes in parallel. It's not a real limitation at the moment because updating instructions in a module is not thread safe (and making it so with the current APIs would probably be somewhat problematic in terms of performance) and so when we do eventually get the ability to run FunctionPasses in parallel they will most likely need new APIs. That said, it's a good idea structurally to view the Function / Block as synchronisation boundaries so that it will be easier to support concurrent execution in the future. >> > > I understand the rationale but are you sure that the prohibition > against *inspecting* other functions during a function pass does exist > and is currently followed? If it does I think the docs ought to make > that clear so I want to make sure if the omission is not deliberate. > > In theory you could still parallelize function pass execution if they > inspected other functions if they used some kind of read/write locking > and used transactional updates; I would think the main point is that > we want the results to be deterministic and not dependent on the order > in which functions are processed, which applies regardless of what > kind of parallelization and/or synchronization is used. both dragonegg and clang (AFAIK) run some function passes on each function in turn as they are turned into LLVM IR. If such a function pass tried to inspect other functions then they won't be able to see all function bodies because they haven't all been output yet. And which functions do have bodies available to be inspected depends on the order in which clang decides to output functions, so the results of running the pass would depend on that order. Ciao, Duncan. From letz at grame.fr Fri Jul 5 07:50:16 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Fri, 5 Jul 2013 16:50:16 +0200 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: <51D62B4E.9050406@grosser.es> References: <51D62B4E.9050406@grosser.es> Message-ID: Le 5 juil. 2013 à 04:11, Tobias Grosser a écrit : > On 07/04/2013 01:39 PM, Stéphane Letz wrote: >> Hi, >> >> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >> >> Any idea of what could be lacking? > > Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. > > If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. > > Cheers, > Tobias > I did some progress: 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll then it works…. Any idea? Thanks. Stéphane Letz From swlin at post.harvard.edu Fri Jul 5 08:14:52 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 5 Jul 2013 08:14:52 -0700 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: <51D6C435.2050400@free.fr> References: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> <51D6C435.2050400@free.fr> Message-ID: > both dragonegg and clang (AFAIK) run some function passes on each function > in turn as they are turned into LLVM IR. If such a function pass tried to > inspect other functions then they won't be able to see all function bodies > because they haven't all been output yet. And which functions do have > bodies > available to be inspected depends on the order in which clang decides to > output functions, so the results of running the pass would depend on that > order. > > Ciao, Duncan. OK, so to be clear, the docs are incomplete and need to be updated? Just trying to get explicit confirmation before I patch this... Stephen From aschwaighofer at apple.com Fri Jul 5 08:23:25 2013 From: aschwaighofer at apple.com (Arnold Schwaighofer) Date: Fri, 05 Jul 2013 10:23:25 -0500 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: References: <51D62B4E.9050406@grosser.es> Message-ID: <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> On Jul 5, 2013, at 9:50 AM, Stéphane Letz wrote: > > Le 5 juil. 2013 à 04:11, Tobias Grosser a écrit : > >> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>> Hi, >>> >>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >>> >>> Any idea of what could be lacking? >> >> Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. >> >> If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. >> >> Cheers, >> Tobias >> > > > I did some progress: > > 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? > > 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: > > opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll > > opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll > > then it works…. > > Any idea? > > Thanks. > > Stéphane Letz Hi Stephane, Move the alloca for “i" into the entry block. The IR coming into the loop vectorizer looks something like the following. The loop vectorizer can't recognize one of the phis as an induction or reduction, so it gives up. The reason why you have this “odd” phi is because SROA (which transforms allocas into SSA variables) does not get rid of the “i” variable (later passes do but leave this odd IR around) because “i”’s alloca is not in the entry block - it only works on allocas in the entry block. opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll LV: Found a loop: code_block8 LV: Found an induction variable. LV: PHI is not a poly recurrence. LV: Found an unidentified PHI. %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] LV: Can't vectorize the instructions or CFG LV: Not vectorizing. IR coming into the vectorizer: code_block8: ; preds = %code_block8.lr.ph, %code_block8 %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index, %code_block8 ] %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] ; <<< THIS phi is the problem. %20 = sext i32 %storemerge8 to i64 %.sum = add i64 %20, %9 %21 = getelementptr inbounds float* %11, i64 %.sum %22 = getelementptr inbounds float* %8, i64 %.sum %23 = load float* %22, align 4 %24 = getelementptr inbounds float* %10, i64 %.sum %25 = load float* %24, align 4 %26 = fadd float %23, %25 store float %26, float* %21, align 4 %next_index = add i32 %next_index10, 1 %27 = icmp slt i32 %next_index, %16 br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge exec_block.return_crit_edge: ; preds = %exit_block6 br label %return return: ; preds = %exec_block.return_crit_edge, %block_code ret void } From dblaikie at gmail.com Fri Jul 5 08:41:14 2013 From: dblaikie at gmail.com (David Blaikie) Date: Fri, 5 Jul 2013 08:41:14 -0700 Subject: [LLVMdev] Is there a way to check that debug metadata are well formed ? In-Reply-To: <51D6C07A.5090109@free.fr> References: <17F9E444F61B644FAAA6EA20EE53E4DBCF4A4C2058@SAFEX1MAIL2.st.com> <51D6C07A.5090109@free.fr> Message-ID: On Jul 5, 2013 5:50 AM, "Duncan Sands" wrote: > > Hi Seb, > > > On 05/07/13 11:08, Sebastien DELDON-GNB wrote: >> >> Hi all, >> >> Is there an easy way to check that debug metadata in a .ll file are well formed ? >> >> Thanks for you answers > > > I don't think so. That's correct. There's no formal/single verification of debug info metadata. There are asserts scattered throughout debug info handling, but they're hardly complete. > It would be great if the verifier checked debug and other > standard meta data. Yep. It's something we've considered but haven't found time for yet. It would allow is to remove much of the ad hoc testing/verification and consolidate it in one place. For now, ideally, anything produced by DIBuilder should be valid. If you find cases where that's not true, we should at least add asserts to DIBuilder to sure that up a bit. > > Ciao, Duncan. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Fri Jul 5 08:43:12 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Fri, 5 Jul 2013 17:43:12 +0200 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> References: <51D62B4E.9050406@grosser.es> <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> Message-ID: Le 5 juil. 2013 à 17:23, Arnold Schwaighofer a écrit : > > On Jul 5, 2013, at 9:50 AM, Stéphane Letz wrote: > >> >> Le 5 juil. 2013 à 04:11, Tobias Grosser a écrit : >> >>> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>>> Hi, >>>> >>>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >>>> >>>> Any idea of what could be lacking? >>> >>> Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. >>> >>> If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. >>> >>> Cheers, >>> Tobias >>> >> >> >> I did some progress: >> >> 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? >> >> 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: >> >> opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll >> >> opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll >> >> then it works…. >> >> Any idea? >> >> Thanks. >> >> Stéphane Letz > > Hi Stephane, > > Move the alloca for “i" into the entry block. > > The IR coming into the loop vectorizer looks something like the following. The loop vectorizer can't recognize one of the phis as an induction or reduction, so it gives up. > > The reason why you have this “odd” phi is because SROA (which transforms allocas into SSA variables) does not get rid of the “i” variable (later passes do but leave this odd IR around) because “i”’s alloca is not in the entry block - it only works on allocas in the entry block. > > opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll > > LV: Found a loop: code_block8 > LV: Found an induction variable. > LV: PHI is not a poly recurrence. > LV: Found an unidentified PHI. %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] > LV: Can't vectorize the instructions or CFG > LV: Not vectorizing. > > IR coming into the vectorizer: > > code_block8: ; preds = %code_block8.lr.ph, %code_block8 > %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index, %code_block8 ] > %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] ; <<< THIS phi is the problem. > %20 = sext i32 %storemerge8 to i64 > %.sum = add i64 %20, %9 > %21 = getelementptr inbounds float* %11, i64 %.sum > %22 = getelementptr inbounds float* %8, i64 %.sum > %23 = load float* %22, align 4 > %24 = getelementptr inbounds float* %10, i64 %.sum > %25 = load float* %24, align 4 > %26 = fadd float %23, %25 > store float %26, float* %21, align 4 > %next_index = add i32 %next_index10, 1 > %27 = icmp slt i32 %next_index, %16 > br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge > > exec_block.return_crit_edge: ; preds = %exit_block6 > br label %return > > return: ; preds = %exec_block.return_crit_edge, %block_code > ret void > } > 1) "entry" block is the first block of the function right? 2) do you mean *all* "alloca" in a function always have to be in the fist entry block? Thanks. Stéphane From aschwaighofer at apple.com Fri Jul 5 08:48:07 2013 From: aschwaighofer at apple.com (Arnold Schwaighofer) Date: Fri, 05 Jul 2013 10:48:07 -0500 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: References: <51D62B4E.9050406@grosser.es> <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> Message-ID: <4BCF6E60-5448-489B-8608-BAEF98904A3C@apple.com> On Jul 5, 2013, at 10:43 AM, Stéphane Letz wrote > > 1) "entry" block is the first block of the function right? Yes. > > 2) do you mean *all* "alloca" in a function always have to be in the fist entry block? If you want them converted into ssa variables early on, yes. From letz at grame.fr Fri Jul 5 08:50:00 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Fri, 5 Jul 2013 17:50:00 +0200 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: <4BCF6E60-5448-489B-8608-BAEF98904A3C@apple.com> References: <51D62B4E.9050406@grosser.es> <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> <4BCF6E60-5448-489B-8608-BAEF98904A3C@apple.com> Message-ID: Le 5 juil. 2013 à 17:48, Arnold Schwaighofer a écrit : > > On Jul 5, 2013, at 10:43 AM, Stéphane Letz wrote >> >> 1) "entry" block is the first block of the function right? > > Yes. OK > >> >> 2) do you mean *all* "alloca" in a function always have to be in the fist entry block? > > If you want them converted into ssa variables early on, yes. > It this documented somewhere? Thanks. Stéphane From aschwaighofer at apple.com Fri Jul 5 09:02:09 2013 From: aschwaighofer at apple.com (Arnold Schwaighofer) Date: Fri, 05 Jul 2013 11:02:09 -0500 Subject: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR In-Reply-To: References: <51D62B4E.9050406@grosser.es> <4E160537-3556-4BBD-A406-AD5EF00D1DF0@apple.com> <4BCF6E60-5448-489B-8608-BAEF98904A3C@apple.com> Message-ID: <09AB6DD9-B411-45BA-8EE5-9D83DFC603CF@apple.com> On Jul 5, 2013, at 10:50 AM, Stéphane Letz wrote: >>> >>> 2) do you mean *all* "alloca" in a function always have to be in the fist entry block? >> >> If you want them converted into ssa variables early on, yes. >> > > > It this documented somewhere? Sort of, here: http://llvm.org/docs/tutorial/LangImpl7.html#memory-in-llvm "The mem2reg pass implements the standard “iterated dominance frontier” algorithm for constructing SSA form and has a number of optimizations that speed up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances: • mem2reg is alloca-driven: it looks for allocas and if it can handle them, it promotes them. It does not apply to global variables or heap allocations. • mem2reg only looks for alloca instructions in the entry block of the function. Being in the entry block guarantees that the alloca is only executed once, which makes analysis simpler.” From ahmed.bougacha at gmail.com Fri Jul 5 09:17:03 2013 From: ahmed.bougacha at gmail.com (Ahmed Bougacha) Date: Fri, 5 Jul 2013 09:17:03 -0700 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: <466D0A86-6789-4BD7-9F83-2AF5C5B145A2@gmail.com> References: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> <466D0A86-6789-4BD7-9F83-2AF5C5B145A2@gmail.com> Message-ID: On Thu, Jul 4, 2013 at 10:21 PM, Charles Davis wrote: > > On Jul 4, 2013, at 10:22 PM, 罗勇刚(Yonggang Luo) wrote: > > > 在 2013-7-4 下午8:53,"Charles Davis" 写道: > > > > > > On Jul 4, 2013, at 1:43 AM, 罗勇刚(Yonggang Luo) wrote: > > > > > LLVM is a modularized software system, so I hope it's was modularized, > > > And ELF.h is definitely belongs to Object by classification, > > > The things that confused me is the ELF.h was placed under Support and > > > don't know why. > > Because it's also used by the MC layer's direct object emission support. > thanks for your response, did MC layer's direct object emission depends on > Object library? > > Nope. The Object library only reads object files. MC, on the other hand, > only writes object files. > That’s true for object emission, but MC already depends on libObject, and at least some of the more experimental parts of MC actually use it to read files. However, - I believe nobody mentioned this - I think the point of having ELF.h and friends in Support is that they’re intended to (more or less exactly) match the headers provided by the system. Most of what’s there is duplicated in a friendlier form (for ObjectFile-related stuff), or just to match the LLVM style (notably Object/MachOFormat.h) in Object. -- Ahmed Bougacha > Chip > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdavis5x at gmail.com Fri Jul 5 10:18:19 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Fri, 5 Jul 2013 11:18:19 -0600 Subject: [LLVMdev] Hi, people, I propose to move Debug and Object File related headers out of Support In-Reply-To: References: <521318E8-FAAC-41D4-88BA-67CDBA0704A7@gmail.com> <466D0A86-6789-4BD7-9F83-2AF5C5B145A2@gmail.com> Message-ID: On Jul 5, 2013, at 10:17 AM, Ahmed Bougacha wrote: > On Thu, Jul 4, 2013 at 10:21 PM, Charles Davis wrote: > > On Jul 4, 2013, at 10:22 PM, 罗勇刚(Yonggang Luo) wrote: > >> >> 在 2013-7-4 下午8:53,"Charles Davis" 写道: >> > >> > >> > On Jul 4, 2013, at 1:43 AM, 罗勇刚(Yonggang Luo) wrote: >> > >> > > LLVM is a modularized software system, so I hope it's was modularized, >> > > And ELF.h is definitely belongs to Object by classification, >> > > The things that confused me is the ELF.h was placed under Support and >> > > don't know why. >> > Because it's also used by the MC layer's direct object emission support. >> thanks for your response, did MC layer's direct object emission depends on Object library? >> > Nope. The Object library only reads object files. MC, on the other hand, only writes object files. > > That’s true for object emission, but MC already depends on libObject, Huh. You're right, looking at MC's LLVMBuild.txt file. > and at least some of the more experimental parts of MC actually use it to read files. That's right, the new object disassembler stuff (that you are working on, if I'm not mistaken!) that was just added needs libObject. But as for debug info, it is true that DebugInfo only knows how to read it, and CodeGen knows how to write it (but not read it). So for now, the DWARF stuff at least definitely needs to stay in Support. > > However, - I believe nobody mentioned this - I think the point of having ELF.h and friends in Support is that they’re intended to (more or less exactly) match the headers provided by the system. Agreed, though I'm not sure if that justifies them being in Support instead of Object. > Most of what’s there is duplicated in a friendlier form (for ObjectFile-related stuff), or just to match the LLVM style (notably Object/MachOFormat.h) in Object. I have to say, I believe that this duplication is an (admittedly minor) problem, because now at least and are out of sync. I think we should get rid of one in favor of the other. Personally, I want to keep the former and dump the latter, but LLVM uses mostly the latter when it needs to work with Mach-O files, so I'm not sure how the rest of the community feels--which header stays and which one goes, where the header should go, or if this is even a bike shed they'd want to repaint. Chip > > -- Ahmed Bougacha > > Chip > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholas at mxc.ca Fri Jul 5 12:43:28 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Fri, 05 Jul 2013 12:43:28 -0700 Subject: [LLVMdev] Any suggestion for "Unknown instruction type encountered" error? In-Reply-To: References: Message-ID: <51D721E0.7060209@mxc.ca> hacker cling wrote: > Hello all, > I was playing with LLVM pass. I changed the > lib/Transforms/Hello/Hello.cpp 's content to be my own pass. Then I make > install the pass and use an example test1.c to see whether it works or > not. When I run example using the following command: > clang -emit-llvm test1.c -c -o test1.bc > opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello < test1.bc > > /dev/null > > It shows the following error: > > Unknown instruction type encountered! > UNREACHABLE executed at include/llvm/InstVisitor.h:120! The error message here means that the type of instruction -- alloca, add, sub, load, store, etc. -- is one that did not exist in the compiler at the time that opt was built. This is a static list in the compiler, same as documented on llvm.org/docs/LangRef.html . My guess is that your .bc file is from a very different version of llvm than your 'opt' binary. Perhaps you're using a clang installed on the system and an opt your built yourself? We sometimes change the instruction numbering for the encoding in the .bc files. Does 'opt -verify < test1.bc' work? Another alternative is that you've managed to form invalid IR in some other way, or corrupted memory. Have you tried building 'opt' and LLVMHello.so with debug (so that llvm's assertions are enabled)? And what about running the 'opt' command under valgrind? Nick > 0 opt 0x00000000014190b6 > llvm::sys::PrintStackTrace(_IO_FILE*) + 38 > 1 opt 0x0000000001419333 > 2 opt 0x0000000001418d8b > 3 libpthread.so.0 0x0000003aa600f500 > 4 libc.so.6 0x0000003aa5c328a5 gsignal + 53 > 5 libc.so.6 0x0000003aa5c34085 abort + 373 > 6 opt 0x000000000140089b > 7 LLVMHello.so 0x00007f889beb5833 > 8 LLVMHello.so 0x00007f889beb57bd > 9 LLVMHello.so 0x00007f889beb575e > 10 LLVMHello.so 0x00007f889beb56c5 > 11 LLVMHello.so 0x00007f889beb55f2 > 12 LLVMHello.so 0x00007f889beb5401 > 13 opt 0x00000000013a4e21 > llvm::FPPassManager::runOnFunction(llvm::Function&) + 393 > 14 opt 0x00000000013a5021 > llvm::FPPassManager::runOnModule(llvm::Module&) + 89 > 15 opt 0x00000000013a5399 > llvm::MPPassManager::runOnModule(llvm::Module&) + 573 > 16 opt 0x00000000013a59a8 > llvm::PassManagerImpl::run(llvm::Module&) + 254 > 17 opt 0x00000000013a5bbf > llvm::PassManager::run(llvm::Module&) + 39 > 18 opt 0x000000000084b455 main + 5591 > 19 libc.so.6 0x0000003aa5c1ecdd __libc_start_main + 253 > 20 opt 0x000000000083d359 > Stack dump: > 0. Program arguments: opt -load > ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello > 1. Running pass 'Function Pass Manager' on module ''. > 2. Running pass 'Hello Pass' on function '@main' > > I will illustrate the pass code, the test1 example, and the IR generated > below, so that anyone could help me or give me some suggestion. Thanks. > > The Hello.cpp pass is as the following: > > #define DEBUG_TYPE"hello" > #include"llvm/Pass.h" > #include"llvm/IR/Module.h" > #include"llvm/InstVisitor.h" > #include"llvm/IR/Constants.h" > #include"llvm/IR/IRBuilder.h" > #include"llvm/Support/raw_ostream.h" > namespace { > > struct Hello : public llvm::FunctionPass, llvm::InstVisitor { > private: > llvm::BasicBlock *FailBB; > public: > static char ID; > Hello() : llvm::FunctionPass(ID) {FailBB = 0;} > > > virtual bool runOnFunction(llvm::Function&F) { > > visit(F); > return false; > } > > llvm::BasicBlock *getTrapBB(llvm::Instruction&Inst) { > > if (FailBB) return FailBB; > llvm::Function *Fn = Inst.getParent()->getParent(); > > llvm::LLVMContext& ctx = Fn->getContext(); > llvm::IRBuilder<> builder(ctx); You don't seem to ever use this builder? > > FailBB = llvm::BasicBlock::Create(ctx,"FailBlock", Fn); > llvm::ReturnInst::Create(Fn->getContext(), FailBB); > return FailBB; > > } > void visitLoadInst(llvm::LoadInst& LI) { > } > > void visitStoreInst(llvm::StoreInst& SI) { > llvm::Value * Addr = SI.getOperand(1); > llvm::PointerType* PTy = llvm::cast(Addr->getType()); > llvm::Type * ElTy = PTy -> getElementType(); > if (!ElTy->isPointerTy()) { > llvm::BasicBlock *OldBB = SI.getParent(); > llvm::errs()<< "yes, got it \n"; > llvm::ICmpInst *Cmp = new llvm::ICmpInst(&SI, llvm::CmpInst::ICMP_EQ, Addr, llvm::Constant::getNullValue(Addr->getType()),""); > > llvm::Instruction *Iter =&SI; > OldBB->getParent()->dump(); > llvm::BasicBlock *NewBB = OldBB->splitBasicBlock(Iter,"newBlock"); > OldBB->getParent()->dump(); > > } > > } > }; > > char Hello::ID = 0; > static llvm::RegisterPass X("hello","Hello Pass", false, false); > } > > The test1.c example is as the following: > > #include > void main() { > int x; > x = 5; > } > > The IR for the example after adding the pass is as the following: > define void @main() #0 { > entry: > %x = alloca i32, align 4 > %0 = icmp eq i32* %x, null > br label %newBlock > > newBlock: ; preds = %entry > store i32 5, i32* %x, align 4 > ret void > } > > > any suggestion? > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From vnorilo at siba.fi Fri Jul 5 14:50:26 2013 From: vnorilo at siba.fi (Vesa Norilo) Date: Sat, 06 Jul 2013 00:50:26 +0300 Subject: [LLVMdev] Building function parameter AttributeSets quickly Message-ID: Hi all, Is there an efficient way to construct an AttributeSet for the purpose of constructing a Function? The only (public) way I've found to designate the parameter index of attributes is via the addAttribute methods of the Function class or the AttributeSet. This is quite inefficient as a new AttributeSet is constructed every time. In fact, it takes the bulk of the time my codegen consumes when dealing with large parameter sets. "HowToUseAttributes" suggests that AttrBuilder can be used as a mutable helper for this purpose. However, it seems to me that this can build the set of attributes that describe a single parameter, rather than a full set with indices. I'm essentially looking for something like: static AttributeSet get(LLVMContext &C, ArrayRef > Attrs); But it is marked private. Am I just missing the right way to do this? Thanks! Vesa From clinghacker at gmail.com Fri Jul 5 17:23:40 2013 From: clinghacker at gmail.com (hacker cling) Date: Sat, 6 Jul 2013 08:23:40 +0800 Subject: [LLVMdev] Any suggestion for "Unknown instruction type encountered" error? In-Reply-To: <51D721E0.7060209@mxc.ca> References: <51D721E0.7060209@mxc.ca> Message-ID: Hi Nick, 2013/7/6 Nick Lewycky > hacker cling wrote: > >> Hello all, >> I was playing with LLVM pass. I changed the >> lib/Transforms/Hello/Hello.cpp 's content to be my own pass. Then I make >> install the pass and use an example test1.c to see whether it works or >> not. When I run example using the following command: >> clang -emit-llvm test1.c -c -o test1.bc >> opt -load ../build_llvm/Debug+Asserts/**lib/LLVMHello.so -hello < >> test1.bc >> > /dev/null >> >> It shows the following error: >> >> Unknown instruction type encountered! >> UNREACHABLE executed at include/llvm/InstVisitor.h:**120! >> > > The error message here means that the type of instruction -- alloca, add, > sub, load, store, etc. -- is one that did not exist in the compiler at the > time that opt was built. This is a static list in the compiler, same as > documented on llvm.org/docs/LangRef.html . > > My guess is that your .bc file is from a very different version of llvm > than your 'opt' binary. Perhaps you're using a clang installed on the > system and an opt your built yourself? We sometimes change the instruction > numbering for the encoding in the .bc files. Does 'opt -verify < test1.bc' > work? > > I am sure that the version of llvm and opt are the same since I build them at the same time. opt -version LLVM (http://llvm.org/): LLVM version 3.3svn DEBUG build with assertions. Built Jun 12 2013 (19:47:01). Default target: x86_64-unknown-linux-gnu Host CPU: penryn clang -v clang version 3.3 (trunk 179269) Target: x86_64-unknown-linux-gnu Thread model: posix They are both based on the trunk 179269. (Currently I am working on Cling( http://root.cern.ch/drupal/content/cling), which is based on LLVM and clang with the version 179269). I use opt -verify < test1.bc -f There is no error output. I also use opt -verify -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello < test1.bc -f There is still no other error output except for the "Unknown instruction type encountered!" Another alternative is that you've managed to form invalid IR in some other > way, or corrupted memory. Have you tried building 'opt' and LLVMHello.so > with debug (so that llvm's assertions are enabled)? And what about running > the 'opt' command under valgrind? > > I use valgrind to test it: valgrind --tool=memcheck --leak-check=full opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello < test1.bc > /dev/null ==3609== Memcheck, a memory error detector ==3609== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==3609== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==3609== Command: opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello ==3609== define void @main() #0 { entry: %x = alloca i32, align 4 %0 = icmp eq i32* %x, null br label %newBlock newBlock: ; preds = %entry store i32 5, i32* %x, align 4 ret void } Unknown instruction type encountered! UNREACHABLE executed at /home/sploving/llvm/include/llvm/InstVisitor.h:120! 0 opt 0x00000000014190b6 llvm::sys::PrintStackTrace(_IO_FILE*) + 38 1 opt 0x0000000001419333 2 opt 0x0000000001418d8b 3 libpthread.so.0 0x0000003aa600f500 4 libc.so.6 0x0000003aa5c328a5 gsignal + 53 5 libc.so.6 0x0000003aa5c34085 abort + 373 6 opt 0x000000000140089b 7 LLVMHello.so 0x00000000050278b3 8 LLVMHello.so 0x000000000502783d 9 LLVMHello.so 0x00000000050277de 10 LLVMHello.so 0x0000000005027745 11 LLVMHello.so 0x0000000005027672 12 LLVMHello.so 0x000000000502746d 13 opt 0x00000000013a4e21 llvm::FPPassManager::runOnFunction(llvm::Function&) + 393 14 opt 0x00000000013a5021 llvm::FPPassManager::runOnModule(llvm::Module&) + 89 15 opt 0x00000000013a5399 llvm::MPPassManager::runOnModule(llvm::Module&) + 573 16 opt 0x00000000013a59a8 llvm::PassManagerImpl::run(llvm::Module&) + 254 17 opt 0x00000000013a5bbf llvm::PassManager::run(llvm::Module&) + 39 18 opt 0x000000000084b455 main + 5591 19 libc.so.6 0x0000003aa5c1ecdd __libc_start_main + 253 20 opt 0x000000000083d359 Stack dump: 0. Program arguments: opt -load ../build_llvm/Debug+Asserts/lib/LLVMHello.so -hello 1. Running pass 'Function Pass Manager' on module ''. 2. Running pass 'Hello Pass' on function '@main' ==3609== ==3609== HEAP SUMMARY: ==3609== in use at exit: 200,735 bytes in 574 blocks ==3609== total heap usage: 2,340 allocs, 1,766 frees, 476,703 bytes allocated ==3609== ==3609== 26 bytes in 1 blocks are possibly lost in loss record 106 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x85A69A: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0x85912A: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0x856B05: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0x8532FC: llvm::cl::opt >::opt, llvm::cl::value_desc>(llvm::cl::FormattingFlags const&, llvm::cl::desc const&, llvm::cl::initializer const&, llvm::cl::value_desc const&) (CommandLine.h:1195) ==3609== by 0x84C9CA: __static_initialization_and_destruction_0(int, int) (opt.cpp:62) ==3609== by 0x84D5F6: global constructors keyed to opt.cpp (opt.cpp:831) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 26 bytes in 1 blocks are possibly lost in loss record 107 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9DDA9: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9DF6B: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x856E61: void llvm::cl::opt_storage::setValue(char const (&) [2], bool) (CommandLine.h:1070) ==3609== by 0x854129: std::string& llvm::cl::opt >::operator=(char const (&) [2]) (CommandLine.h:1166) ==3609== by 0x84A1AB: main (opt.cpp:613) ==3609== ==3609== 26 bytes in 1 blocks are possibly lost in loss record 108 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x143030A: llvm::tool_output_file::CleanupInstaller::CleanupInstaller(char const*) (ToolOutputFile.cpp:19) ==3609== by 0x14304A0: llvm::tool_output_file::tool_output_file(char const*, std::string&, unsigned int) (ToolOutputFile.cpp:39) ==3609== by 0x84A1F0: main (opt.cpp:617) ==3609== ==3609== 29 bytes in 1 blocks are possibly lost in loss record 151 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0xC0722A: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0xC071A3: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0xC06FB6: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0xC06BEE: llvm::cl::opt >::opt, llvm::cl::OptionHidden>(char const (&) [24], llvm::cl::desc const&, llvm::cl::initializer const&, llvm::cl::OptionHidden const&) (CommandLine.h:1195) ==3609== by 0xC06492: __static_initialization_and_destruction_0(int, int) (PostRASchedulerList.cpp:66) ==3609== by 0xC06624: global constructors keyed to PostRASchedulerList.cpp (PostRASchedulerList.cpp:776) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 29 bytes in 1 blocks are possibly lost in loss record 152 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0xC0722A: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0xC071A3: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0xC06FB6: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0xFBEDA9: llvm::cl::opt >::opt, llvm::cl::OptionHidden, llvm::cl::ValueExpected>(char const (&) [21], llvm::cl::initializer const&, llvm::cl::OptionHidden const&, llvm::cl::ValueExpected const&) (CommandLine.h:1195) ==3609== by 0xFBEB5D: __static_initialization_and_destruction_0(int, int) (GCOVProfiling.cpp:46) ==3609== by 0xFBEBF4: global constructors keyed to GCOVProfiling.cpp (GCOVProfiling.cpp:855) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 32 bytes in 1 blocks are possibly lost in loss record 199 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x139B32C: llvm::Module::Module(llvm::StringRef, llvm::LLVMContext&) (Module.cpp:46) ==3609== by 0x12926C1: llvm::getLazyBitcodeModule(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3038) ==3609== by 0x12928B5: llvm::ParseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3078) ==3609== by 0x123674D: llvm::ParseIR(llvm::MemoryBuffer*, llvm::SMDiagnostic&, llvm::LLVMContext&) (IRReader.cpp:67) ==3609== by 0x1236A0F: llvm::ParseIRFile(std::string const&, llvm::SMDiagnostic&, llvm::LLVMContext&) (IRReader.cpp:88) ==3609== by 0x84A036: main (opt.cpp:593) ==3609== ==3609== 37 bytes in 1 blocks are possibly lost in loss record 219 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x1132DB0: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0x11317C9: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0x1130EB8: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0x11307F9: llvm::cl::opt >::opt, llvm::cl::value_desc, llvm::cl::desc, llvm::cl::OptionHidden>(char const (&) [25], llvm::cl::initializer const&, llvm::cl::value_desc const&, llvm::cl::desc const&, llvm::cl::OptionHidden const&) (CommandLine.h:1202) ==3609== by 0x1130663: __static_initialization_and_destruction_0(int, int) (PathProfileInfo.cpp:32) ==3609== by 0x113070E: global constructors keyed to PathProfileInfo.cpp (PathProfileInfo.cpp:433) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 37 bytes in 1 blocks are possibly lost in loss record 220 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x1132DB0: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0x11317C9: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0x1130EB8: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0x115B6F9: llvm::cl::opt >::opt, llvm::cl::value_desc, llvm::cl::desc>(char const (&) [18], llvm::cl::initializer const&, llvm::cl::value_desc const&, llvm::cl::desc const&) (CommandLine.h:1195) ==3609== by 0x115B4D9: __static_initialization_and_destruction_0(int, int) (ProfileInfoLoaderPass.cpp:37) ==3609== by 0x115B5B6: global constructors keyed to ProfileInfoLoaderPass.cpp (ProfileInfoLoaderPass.cpp:267) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 37 bytes in 1 blocks are possibly lost in loss record 221 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x1132DB0: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0x11317C9: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0x1130EB8: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0x113C091: llvm::cl::opt >::opt, llvm::cl::value_desc, llvm::cl::desc>(char const (&) [13], llvm::cl::initializer const&, llvm::cl::value_desc const&, llvm::cl::desc const&) (CommandLine.h:1195) ==3609== by 0x113BE9D: __static_initialization_and_destruction_0(int, int) (ProfileDataLoaderPass.cpp:42) ==3609== by 0x113BF44: global constructors keyed to ProfileDataLoaderPass.cpp (ProfileDataLoaderPass.cpp:188) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 43 bytes in 1 blocks are possibly lost in loss record 240 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0xBFEB42: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0xBFE447: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0xBFDDBD: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0xBFD782: llvm::cl::opt >::opt >(char const (&) [20], llvm::cl::ValueExpected const&, llvm::cl::desc const&, llvm::cl::value_desc const&, llvm::cl::initializer const&) (CommandLine.h:1203) ==3609== by 0xBFCC53: __static_initialization_and_destruction_0(int, int) (Passes.cpp:86) ==3609== by 0xBFCD8F: global constructors keyed to Passes.cpp (Passes.cpp:760) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 49 bytes in 1 blocks are possibly lost in loss record 249 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x84F356: llvm::Module::setTargetTriple(llvm::StringRef) (Module.h:265) ==3609== by 0x128AD61: llvm::BitcodeReader::ParseModule(bool) (BitcodeReader.cpp:1613) ==3609== by 0x128C026: llvm::BitcodeReader::ParseBitcodeInto(llvm::Module*) (BitcodeReader.cpp:1826) ==3609== by 0x1292712: llvm::getLazyBitcodeModule(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3041) ==3609== by 0x12928B5: llvm::ParseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3078) ==3609== by 0x123674D: llvm::ParseIR(llvm::MemoryBuffer*, llvm::SMDiagnostic&, llvm::LLVMContext&) (IRReader.cpp:67) ==3609== ==3609== 49 bytes in 1 blocks are possibly lost in loss record 250 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x11CB6F6: llvm::TargetMachine::TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&) (TargetMachine.cpp:58) ==3609== by 0xB960AF: llvm::LLVMTargetMachine::LLVMTargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (LLVMTargetMachine.cpp:70) ==3609== by 0x861341: llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level, bool) (X86TargetMachine.cpp:85) ==3609== by 0x86119B: llvm::X86_64TargetMachine::X86_64TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (X86TargetMachine.cpp:71) ==3609== by 0x863564: llvm::RegisterTargetMachine::Allocator(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (TargetRegistry.h:1015) ==3609== by 0x84FA5E: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) const (TargetRegistry.h:340) ==3609== ==3609== 49 bytes in 1 blocks are possibly lost in loss record 251 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x8613AE: llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level, bool) (X86TargetMachine.cpp:85) ==3609== by 0x86119B: llvm::X86_64TargetMachine::X86_64TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (X86TargetMachine.cpp:71) ==3609== by 0x863564: llvm::RegisterTargetMachine::Allocator(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (TargetRegistry.h:1015) ==3609== by 0x84FA5E: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) const (TargetRegistry.h:340) ==3609== by 0x849E3C: GetTargetMachine(llvm::Triple) (opt.cpp:548) ==3609== by 0x84A553: main (opt.cpp:658) ==3609== ==3609== 49 bytes in 1 blocks are possibly lost in loss record 252 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x1220A16: llvm::MCSubtargetInfo::InitMCSubtargetInfo(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::SubtargetFeatureKV const*, llvm::SubtargetFeatureKV const*, llvm::SubtargetInfoKV const*, llvm::MCWriteProcResEntry const*, llvm::MCWriteLatencyEntry const*, llvm::MCReadAdvanceEntry const*, llvm::InstrStage const*, unsigned int const*, unsigned int const*, unsigned int, unsigned int) (MCSubtargetInfo.cpp:48) ==3609== by 0x956879: llvm::X86GenSubtargetInfo::X86GenSubtargetInfo(llvm::StringRef, llvm::StringRef, llvm::StringRef) (X86GenSubtargetInfo.inc:2102) ==3609== by 0x957FBD: llvm::X86Subtarget::X86Subtarget(std::string const&, std::string const&, std::string const&, unsigned int, bool) (X86Subtarget.cpp:483) ==3609== by 0x8613D6: llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level, bool) (X86TargetMachine.cpp:85) ==3609== by 0x86119B: llvm::X86_64TargetMachine::X86_64TargetMachine(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (X86TargetMachine.cpp:71) ==3609== by 0x863564: llvm::RegisterTargetMachine::Allocator(llvm::Target const&, llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOpt::Level) (TargetRegistry.h:1015) ==3609== ==3609== 50 bytes in 1 blocks are possibly lost in loss record 253 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF32: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x1138894: void llvm::cl::initializer::apply > >(llvm::cl::opt >&) const (CommandLine.h:290) ==3609== by 0x1137382: void llvm::cl::applicator >::opt > >(llvm::cl::initializer const&, llvm::cl::opt >&) (CommandLine.h:974) ==3609== by 0x1136AF0: void llvm::cl::apply, llvm::cl::opt > >(llvm::cl::initializer const&, llvm::cl::opt >*) (CommandLine.h:1012) ==3609== by 0x1136347: llvm::cl::opt >::opt, llvm::cl::value_desc, llvm::cl::desc, llvm::cl::OptionHidden>(char const (&) [27], llvm::cl::initializer const&, llvm::cl::value_desc const&, llvm::cl::desc const&, llvm::cl::OptionHidden const&) (CommandLine.h:1202) ==3609== by 0x1136183: __static_initialization_and_destruction_0(int, int) (PathProfileVerifier.cpp:54) ==3609== by 0x113621E: global constructors keyed to PathProfileVerifier.cpp (PathProfileVerifier.cpp:206) ==3609== by 0x1443675: ??? (in /usr/local/bin/opt) ==3609== by 0x83C282: ??? (in /usr/local/bin/opt) ==3609== ==3609== 69 bytes in 1 blocks are possibly lost in loss record 435 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84EC42: llvm::cl::parser::parse(llvm::cl::Option&, llvm::StringRef, llvm::StringRef, std::string&) (CommandLine.h:878) ==3609== by 0x85D961: llvm::cl::opt >::handleOccurrence(unsigned int, llvm::StringRef, llvm::StringRef) (CommandLine.h:1127) ==3609== by 0x13EFAE6: llvm::cl::Option::addOccurrence(unsigned int, llvm::StringRef, llvm::StringRef, bool) (CommandLine.cpp:883) ==3609== by 0x13ED212: CommaSeparateAndAddOccurence(llvm::cl::Option*, unsigned int, llvm::StringRef, llvm::StringRef, bool) (CommandLine.cpp:259) ==3609== by 0x13ED4EC: ProvideOption(llvm::cl::Option*, llvm::StringRef, llvm::StringRef, int, char const* const*, int&) (CommandLine.cpp:299) ==3609== by 0x13EEFCA: llvm::cl::ParseCommandLineOptions(int, char const* const*, char const*) (CommandLine.cpp:724) ==3609== by 0x849F96: main (opt.cpp:582) ==3609== ==3609== 170 bytes in 1 blocks are possibly lost in loss record 461 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x125522A: llvm::Module::setDataLayout(llvm::StringRef) (Module.h:262) ==3609== by 0x128AE35: llvm::BitcodeReader::ParseModule(bool) (BitcodeReader.cpp:1620) ==3609== by 0x128C026: llvm::BitcodeReader::ParseBitcodeInto(llvm::Module*) (BitcodeReader.cpp:1826) ==3609== by 0x1292712: llvm::getLazyBitcodeModule(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3041) ==3609== by 0x12928B5: llvm::ParseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) (BitcodeReader.cpp:3078) ==3609== by 0x123674D: llvm::ParseIR(llvm::MemoryBuffer*, llvm::SMDiagnostic&, llvm::LLVMContext&) (IRReader.cpp:67) ==3609== ==3609== 208 bytes in 7 blocks are possibly lost in loss record 467 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x12B9552: llvm::StringAttributeEntry::StringAttributeEntry(llvm::StringRef, llvm::StringRef) (AttributeImpl.h:87) ==3609== by 0x12B56C0: llvm::AttributeImpl::AttributeImpl(llvm::LLVMContext&, llvm::StringRef, llvm::StringRef) (Attributes.cpp:288) ==3609== by 0x12B4048: llvm::Attribute::get(llvm::LLVMContext&, llvm::StringRef, llvm::StringRef) (Attributes.cpp:68) ==3609== by 0x12B6BA4: llvm::AttributeSet::get(llvm::LLVMContext&, unsigned int, llvm::AttrBuilder&) (Attributes.cpp:619) ==3609== by 0x12856AC: llvm::BitcodeReader::ParseAttributeGroupBlock() (BitcodeReader.cpp:577) ==3609== by 0x128A9D0: llvm::BitcodeReader::ParseModule(bool) (BitcodeReader.cpp:1534) ==3609== ==3609== 302 bytes in 7 blocks are possibly lost in loss record 471 of 535 ==3609== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298) ==3609== by 0x33B2A9C3C8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CDE4: ??? (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x33B2A9CF6A: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.13) ==3609== by 0x84D7CF: llvm::StringRef::str() const (StringRef.h:183) ==3609== by 0x84D80D: llvm::StringRef::operator std::string() const (StringRef.h:200) ==3609== by 0x12B953B: llvm::StringAttributeEntry::StringAttributeEntry(llvm::StringRef, llvm::StringRef) (AttributeImpl.h:87) ==3609== by 0x12B56C0: llvm::AttributeImpl::AttributeImpl(llvm::LLVMContext&, llvm::StringRef, llvm::StringRef) (Attributes.cpp:288) ==3609== by 0x12B4048: llvm::Attribute::get(llvm::LLVMContext&, llvm::StringRef, llvm::StringRef) (Attributes.cpp:68) ==3609== by 0x12B6BA4: llvm::AttributeSet::get(llvm::LLVMContext&, unsigned int, llvm::AttrBuilder&) (Attributes.cpp:619) ==3609== by 0x12856AC: llvm::BitcodeReader::ParseAttributeGroupBlock() (BitcodeReader.cpp:577) ==3609== by 0x128A9D0: llvm::BitcodeReader::ParseModule(bool) (BitcodeReader.cpp:1534) ==3609== ==3609== LEAK SUMMARY: ==3609== definitely lost: 0 bytes in 0 blocks ==3609== indirectly lost: 0 bytes in 0 blocks ==3609== possibly lost: 1,317 bytes in 31 blocks ==3609== still reachable: 199,418 bytes in 543 blocks ==3609== suppressed: 0 bytes in 0 blocks ==3609== Reachable blocks (those to which a pointer was found) are not shown. ==3609== To see them, rerun with: --leak-check=full --show-reachable=yes ==3609== ==3609== For counts of detected and suppressed errors, rerun with: -v ==3609== ERROR SUMMARY: 19 errors from 19 contexts (suppressed: 6 from 6) There seems no invalid memory access. Do you think that maybe the error is based on not the latest llvm version? or is it a llvm bug? thanks. > Nick > > > 0 opt 0x00000000014190b6 >> llvm::sys::PrintStackTrace(_**IO_FILE*) + 38 >> 1 opt 0x0000000001419333 >> 2 opt 0x0000000001418d8b >> 3 libpthread.so.0 0x0000003aa600f500 >> 4 libc.so.6 0x0000003aa5c328a5 gsignal + 53 >> 5 libc.so.6 0x0000003aa5c34085 abort + 373 >> 6 opt 0x000000000140089b >> 7 LLVMHello.so 0x00007f889beb5833 >> 8 LLVMHello.so 0x00007f889beb57bd >> 9 LLVMHello.so 0x00007f889beb575e >> 10 LLVMHello.so 0x00007f889beb56c5 >> 11 LLVMHello.so 0x00007f889beb55f2 >> 12 LLVMHello.so 0x00007f889beb5401 >> 13 opt 0x00000000013a4e21 >> llvm::FPPassManager::**runOnFunction(llvm::Function&) + 393 >> 14 opt 0x00000000013a5021 >> llvm::FPPassManager::**runOnModule(llvm::Module&) + 89 >> 15 opt 0x00000000013a5399 >> llvm::MPPassManager::**runOnModule(llvm::Module&) + 573 >> 16 opt 0x00000000013a59a8 >> llvm::PassManagerImpl::run(**llvm::Module&) + 254 >> 17 opt 0x00000000013a5bbf >> llvm::PassManager::run(llvm::**Module&) + 39 >> 18 opt 0x000000000084b455 main + 5591 >> 19 libc.so.6 0x0000003aa5c1ecdd __libc_start_main + 253 >> 20 opt 0x000000000083d359 >> Stack dump: >> 0. Program arguments: opt -load >> ../build_llvm/Debug+Asserts/**lib/LLVMHello.so -hello >> 1. Running pass 'Function Pass Manager' on module ''. >> 2. Running pass 'Hello Pass' on function '@main' >> >> I will illustrate the pass code, the test1 example, and the IR generated >> below, so that anyone could help me or give me some suggestion. Thanks. >> >> The Hello.cpp pass is as the following: >> >> #define DEBUG_TYPE"hello" >> #include"llvm/Pass.h" >> #include"llvm/IR/Module.h" >> #include"llvm/InstVisitor.h" >> #include"llvm/IR/Constants.h" >> #include"llvm/IR/IRBuilder.h" >> #include"llvm/Support/raw_**ostream.h" >> namespace { >> >> struct Hello : public llvm::FunctionPass, llvm::InstVisitor { >> private: >> llvm::BasicBlock *FailBB; >> public: >> static char ID; >> Hello() : llvm::FunctionPass(ID) {FailBB = 0;} >> >> >> virtual bool runOnFunction(llvm::Function&**F) { >> >> visit(F); >> return false; >> } >> >> llvm::BasicBlock *getTrapBB(llvm::Instruction&**Inst) { >> >> if (FailBB) return FailBB; >> llvm::Function *Fn = Inst.getParent()->getParent(); >> >> llvm::LLVMContext& ctx = Fn->getContext(); >> llvm::IRBuilder<> builder(ctx); >> > > You don't seem to ever use this builder? > > >> FailBB = llvm::BasicBlock::Create(ctx,"**FailBlock", Fn); >> llvm::ReturnInst::Create(Fn->**getContext(), FailBB); >> return FailBB; >> >> } >> void visitLoadInst(llvm::LoadInst& LI) { >> } >> >> void visitStoreInst(llvm::**StoreInst& SI) { >> >> llvm::Value * Addr = SI.getOperand(1); >> llvm::PointerType* PTy = llvm::cast(** >> Addr->getType()); >> llvm::Type * ElTy = PTy -> getElementType(); >> if (!ElTy->isPointerTy()) { >> llvm::BasicBlock *OldBB = SI.getParent(); >> llvm::errs()<< "yes, got it \n"; >> llvm::ICmpInst *Cmp = new llvm::ICmpInst(&SI, >> llvm::CmpInst::ICMP_EQ, Addr, llvm::Constant::getNullValue(** >> Addr->getType()),""); >> >> llvm::Instruction *Iter =&SI; >> OldBB->getParent()->dump(); >> llvm::BasicBlock *NewBB = OldBB->splitBasicBlock(Iter,"** >> newBlock"); >> OldBB->getParent()->dump(); >> >> } >> >> } >> }; >> >> char Hello::ID = 0; >> static llvm::RegisterPass X("hello","Hello Pass", false, >> false); >> } >> >> The test1.c example is as the following: >> >> #include >> void main() { >> int x; >> x = 5; >> } >> >> The IR for the example after adding the pass is as the following: >> define void @main() #0 { >> entry: >> %x = alloca i32, align 4 >> %0 = icmp eq i32* %x, null >> br label %newBlock >> >> newBlock: ; preds = %entry >> store i32 5, i32* %x, align 4 >> ret void >> } >> >> >> any suggestion? >> >> >> ______________________________**_________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Fri Jul 5 23:45:53 2013 From: baldrick at free.fr (Duncan Sands) Date: Sat, 06 Jul 2013 08:45:53 +0200 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: References: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> <51D6C435.2050400@free.fr> Message-ID: <51D7BD21.7070702@free.fr> Hi Stephen, On 05/07/13 17:14, Stephen Lin wrote: >> both dragonegg and clang (AFAIK) run some function passes on each function >> in turn as they are turned into LLVM IR. If such a function pass tried to >> inspect other functions then they won't be able to see all function bodies >> because they haven't all been output yet. And which functions do have >> bodies >> available to be inspected depends on the order in which clang decides to >> output functions, so the results of running the pass would depend on that >> order. >> >> Ciao, Duncan. > > OK, so to be clear, the docs are incomplete and need to be updated? > Just trying to get explicit confirmation before I patch this... in my opinion function passes should not be inspecting the bodies of other functions. I'm happy to review a doc patch that says this. Ciao, Duncan. From morten at hue.no Sat Jul 6 00:33:20 2013 From: morten at hue.no (Morten Ofstad) Date: Sat, 6 Jul 2013 09:33:20 +0200 Subject: [LLVMdev] [cfe-dev] llvm (hence Clang) not compiling withVisual Studio 2008 In-Reply-To: References: Message-ID: There is some historical precedence for fixing the problem with VS lower_bound by changing the LLVM source - when I first got LLVM to compile with Visual Studio, patches for unsymmetric operator < were accepted into the LLVM repo, and I believe it's been done several times after that as well. m. >From: Ahmed Bougacha >Sent: Friday, July 05, 2013 1:43 AM >To: Benoit Perrot >Cc: cfe-dev at cs.uiuc.edu ; llvmdev at cs.uiuc.edu >Subject: Re: [LLVMdev] [cfe-dev] llvm (hence Clang) not compiling >withVisual Studio 2008 >On Thu, Jul 4, 2013 at 12:48 AM, Benoit Perrot > wrote: >>3. modify the code in MCModule.cpp to cope with the implementation of >>"lower_bound" in VS 2008. >> >(3) Fixed in r185676. >Requiring VS 2010 for a minor problem like this (even though there are more >like it) isn’t warranted I think. From t.p.northover at gmail.com Sat Jul 6 01:02:41 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Sat, 6 Jul 2013 09:02:41 +0100 Subject: [LLVMdev] [cfe-dev] llvm (hence Clang) not compiling withVisual Studio 2008 In-Reply-To: References: Message-ID: > There is some historical precedence for fixing the problem with VS > lower_bound by changing the LLVM source - when I first got LLVM to compile > with Visual Studio, patches for unsymmetric operator < were accepted into > the LLVM repo, and I believe it's been done several times after that as > well. In the C++11 discussion back in January (http://llvm.1065342.n5.nabble.com/Using-C-11-language-features-in-LLVM-itself-td53319.html) there seemed to be some kind of consensus for 2010 being a reasonable minimum. Perhaps this is a good time to break compatibility officially. Actually, whatever did happen to using C++11? No-one mentioned anything about it after that thread. Tim. From swlin at post.harvard.edu Sat Jul 6 02:49:13 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Sat, 6 Jul 2013 02:49:13 -0700 Subject: [LLVMdev] Docs question: legality of inspecting other functions in a function pass In-Reply-To: <51D7BD21.7070702@free.fr> References: <1A9CAAEA-2840-459E-B860-F83D64743CC5@cl.cam.ac.uk> <51D6C435.2050400@free.fr> <51D7BD21.7070702@free.fr> Message-ID: > in my opinion function passes should not be inspecting the bodies of other > functions. I'm happy to review a doc patch that says this. > > Ciao, Duncan. > OK, thanks, I submitted a (one-line) patch to llvm-commits: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130701/180210.html I'll commit it in a few days if no one objects. Stephen From ahmed.bougacha at gmail.com Sat Jul 6 09:05:43 2013 From: ahmed.bougacha at gmail.com (Ahmed Bougacha) Date: Sat, 6 Jul 2013 09:05:43 -0700 Subject: [LLVMdev] Host compiler requirements: Dropping VS 2008, using C++11? Message-ID: Hi all, A few days ago, there was a report of LLVM not compiling on VS 2008, because of asymmetric std::lower_bound comparators not supported there. As noted by a few people, maybe it's time to drop VS 2008 compatibility and move the requirements to VS 2010? While there, what about going further and starting using C++11? Now seems as good a time as ever; my takeaway from that few months old discussion was that once 3.3 is released, it would be reasonable to start using features supported by VS2010 / gcc-4.4 / clang-3.1. That would be now, are there any objections left? -- Ahmed On Sat, Jul 6, 2013 at 1:02 AM, Tim Northover wrote: > > > There is some historical precedence for fixing the problem with VS > > lower_bound by changing the LLVM source - when I first got LLVM to compile > > with Visual Studio, patches for unsymmetric operator < were accepted into > > the LLVM repo, and I believe it's been done several times after that as > > well. > > In the C++11 discussion back in January > (http://llvm.1065342.n5.nabble.com/Using-C-11-language-features-in-LLVM-itself-td53319.html) > there seemed to be some kind of consensus for 2010 being a reasonable > minimum. Perhaps this is a good time to break compatibility > officially. > > Actually, whatever did happen to using C++11? No-one mentioned > anything about it after that thread. Valid points, raised in the commit thread. Changed the subject to get people's attention! > > Tim. From aaron at aaronballman.com Sat Jul 6 09:59:39 2013 From: aaron at aaronballman.com (Aaron Ballman) Date: Sat, 6 Jul 2013 12:59:39 -0400 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: I'm in favor of dropping VS 2008 support (in fact, I thought we had already talked about doing that, but perhaps I am remembering incorrectly). I think C++11 support should be a separate discussion than dropping VS 2008 support because it's likely to be a bit more in-depth, but I'm in favor of it. ~Aaron On Sat, Jul 6, 2013 at 12:05 PM, Ahmed Bougacha wrote: > Hi all, > > A few days ago, there was a report of LLVM not compiling on VS 2008, > because of asymmetric std::lower_bound comparators not supported > there. > > As noted by a few people, maybe it's time to drop VS 2008 > compatibility and move the requirements to VS 2010? > > While there, what about going further and starting using C++11? Now > seems as good a time as ever; my takeaway from that few months old > discussion was that once 3.3 is released, it would be reasonable to > start using features supported by VS2010 / gcc-4.4 / clang-3.1. That > would be now, are there any objections left? > > -- Ahmed > > On Sat, Jul 6, 2013 at 1:02 AM, Tim Northover wrote: >> >> > There is some historical precedence for fixing the problem with VS >> > lower_bound by changing the LLVM source - when I first got LLVM to compile >> > with Visual Studio, patches for unsymmetric operator < were accepted into >> > the LLVM repo, and I believe it's been done several times after that as >> > well. >> >> In the C++11 discussion back in January >> (http://llvm.1065342.n5.nabble.com/Using-C-11-language-features-in-LLVM-itself-td53319.html) >> there seemed to be some kind of consensus for 2010 being a reasonable >> minimum. Perhaps this is a good time to break compatibility >> officially. >> >> Actually, whatever did happen to using C++11? No-one mentioned >> anything about it after that thread. > > Valid points, raised in the commit thread. Changed the subject to get > people's attention! > >> >> Tim. > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev From avilella at gmail.com Sun Jul 7 02:21:43 2013 From: avilella at gmail.com (Albert Vilella) Date: Sun, 7 Jul 2013 10:21:43 +0100 Subject: [LLVMdev] trying to compile llvm+clang on CentOS 5 Message-ID: I am trying to have llvm and clang on a centOS 5 without root permissions. I tried to do it downloading llvm and clang src packages and trying the ususal configure, make and make install steps as such: wget http://llvm.org/releases/3.3/llvm-3.3.src.tar.gz wget http://llvm.org/releases/3.3/cfe-3.3.src.tar.gz tar xzf llvm-3.3.src.tar.gz && cd llvm-3.3.src/tools/ && tar xzf ../../cfe-3.3.src.tar.gz I tried with a newer version of gcc compiled for this 64bit CentOS system, because the older version wouldn't work (see below). When I try it with the newer version, I get this: export LD_LIBRARY_PATH=/home/avilella/src/gcc/gcc-4.7.2/lib64:/home/avilella/src/gcc/gcc-4.7.2/lib export CC=/home/avilella/src/gcc/gcc-4.7.2/bin/gcc export CXX=/home/avilella/src/gcc/gcc-4.7.2/bin/g++ export PATH=/home/avilella/src/python/python-2.7.3/bin:$PATH cd ~/src/llvm/latest/llvm-3.3.src ./configure --prefix=/home/avilella/src/llvm/latest/llvm && make clean && make -j8 && make install After these steps, I don't see clang in the bin directory: /home/avilella/src/llvm/latest/llvm/bin So I followed the instructions in the clang directory, and ran make -j8 on it: cd ~/src/llvm/latest/llvm-3.3.src/tools/cfe-3.3.src make -j8 Doing so, I get this clange/Config/config.h error: [...] InitHeaderSearch.cpp:17:51: fatal error: clang/Config/config.h: No such file or directory [...] This is mentioned in a bug report from 2011, which I would be solved by now: http://llvm.org/bugs/show_bug.cgi?id=11903 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ofv at wanadoo.es Sun Jul 7 03:23:47 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Sun, 07 Jul 2013 12:23:47 +0200 Subject: [LLVMdev] trying to compile llvm+clang on CentOS 5 References: Message-ID: <87obaept1o.fsf@wanadoo.es> Albert Vilella writes: > I am trying to have llvm and clang on a centOS 5 without root permissions. > I tried to do it downloading llvm and clang src packages and trying the > ususal configure, make and make install steps as such: > > wget http://llvm.org/releases/3.3/llvm-3.3.src.tar.gz > wget http://llvm.org/releases/3.3/cfe-3.3.src.tar.gz > tar xzf llvm-3.3.src.tar.gz && cd llvm-3.3.src/tools/ && tar xzf > ../../cfe-3.3.src.tar.gz This is wrong, because it doesn't create a directory named `clang' under $LLVM_SRC_ROOT/tools. Rename the directory created by the last tar command (`cfe-3.3.src') to `clang' and reconfigure. P.S.: it is recommended to use a separate build directory instead of building on the directory where the source code is. From klimek at google.com Sun Jul 7 10:11:57 2013 From: klimek at google.com (Manuel Klimek) Date: Sun, 7 Jul 2013 19:11:57 +0200 Subject: [LLVMdev] Phabricator down Message-ID: Hi, unfortunately phab has gone down and I currently don't have access to fix it up. I'll work on it first thing tomorrow, so ETA for it getting back is roughly 14 hours. Sorry! /Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From tristan_schmelcher at alumni.uwaterloo.ca Sun Jul 7 15:09:45 2013 From: tristan_schmelcher at alumni.uwaterloo.ca (Tristan Schmelcher) Date: Sun, 7 Jul 2013 18:09:45 -0400 Subject: [LLVMdev] Andersen alias analysis Message-ID: Hello, I am writing to let the community know that I am working on a new implementation of Andersen alias analysis for LLVM. The code is quite young and not useful yet, but you can see what I have at https://github.com/TristanSchmelcher/llvm-andersen. From swlin at post.harvard.edu Sun Jul 7 23:49:50 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Sun, 7 Jul 2013 23:49:50 -0700 Subject: [LLVMdev] Link step timeout on buildbots Message-ID: Hi, I'm not sure if this is a configuration controlled centrally, but if it is, is there any way to disable or extend the process timeout on the lab.llvm.org buildbots? It doesn't seem to serve much purpose except to introduce false build failures, such as these from clang-x86_64-ubuntu: http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9519 http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9528 http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9529 Apologies if this has been asked before. Stephen From baldrick at free.fr Mon Jul 8 01:01:11 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 08 Jul 2013 10:01:11 +0200 Subject: [LLVMdev] Link step timeout on buildbots In-Reply-To: References: Message-ID: <51DA71C7.2090004@free.fr> Hi Stephen, > I'm not sure if this is a configuration controlled centrally, but if > it is, is there any way to disable or extend the process timeout on > the lab.llvm.org buildbots? It doesn't seem to serve much purpose > except to introduce false build failures, such as these from > clang-x86_64-ubuntu: it does serve a purpose, since every now and again buildbots hang forever due to a miscompilation turning a finite loop in a compiled program into an infinite loop, or by an optimizer going into an infinite loop due to a bug in it. As for setting timeouts, the builders have a timeout parameter, take a look in llvm.org/svn/llvm-project/zorg/trunk/buildbot/osuosl/master/config Ciao, Duncan. > > http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9519 > http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9528 > http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu/builds/9529 > > Apologies if this has been asked before. > > Stephen > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From chandlerc at google.com Mon Jul 8 03:15:09 2013 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 8 Jul 2013 03:15:09 -0700 Subject: [LLVMdev] Phabricator down In-Reply-To: References: Message-ID: Just as a tiny update, Manuel is actively working on it, but a small issue has turned into a larger issue... stay tuned... On Sun, Jul 7, 2013 at 10:11 AM, Manuel Klimek wrote: > Hi, > > unfortunately phab has gone down and I currently don't have access to fix > it up. I'll work on it first thing tomorrow, so ETA for it getting back is > roughly 14 hours. > > Sorry! > /Manuel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klimek at google.com Mon Jul 8 05:17:53 2013 From: klimek at google.com (Manuel Klimek) Date: Mon, 8 Jul 2013 14:17:53 +0200 Subject: [LLVMdev] Phabricator down In-Reply-To: References: Message-ID: We should be back up - please let me know if anything doesn't work as expected... Cheers, /Manuel On Mon, Jul 8, 2013 at 12:15 PM, Chandler Carruth wrote: > Just as a tiny update, Manuel is actively working on it, but a small issue > has turned into a larger issue... stay tuned... > > > On Sun, Jul 7, 2013 at 10:11 AM, Manuel Klimek wrote: > >> Hi, >> >> unfortunately phab has gone down and I currently don't have access to fix >> it up. I'll work on it first thing tomorrow, so ETA for it getting back is >> roughly 14 hours. >> >> Sorry! >> /Manuel >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricks at carbondesignsystems.com Mon Jul 8 05:19:15 2013 From: ricks at carbondesignsystems.com (Rick Sullivan) Date: Mon, 8 Jul 2013 08:19:15 -0400 Subject: [LLVMdev] Problem Using libLLVM-3.3.so Message-ID: <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFD6B@BE198.mail.lan> We're using the LLVM 3.3 AArch64 disassembler in the following way. We have built LLVM 3.3 on Linux as a shared library; and have a main program that dynamically loads shared objects (.so libraries). The program is a simulator (though that shouldn't be relevant to this question), and the shared objects it loads are electronic components that participate in the simulation. If the electronic component happens to be an ARM processor, it will make reference to the LLVM 3.3 shared library - specifically the AArch64 disassembler. The problem is this. For some simulations, the LLVM shared library seems to take a segfault on exit. It runs correctly, but when the simulator finishes, it crashes on exit. I've traced this back to the LLVM library by running the following experiment - run a known "good" simulator build without any references to LLVM, and observed that it runs correctly. Now rebuild the known "good" shared objects (the electronic components in the simulation), and link to the LLVM shared library. Still no references in the code to LLVM, just linking to the LLVM shared library. This causes the LLVM shared library to be loaded when the simulation is run; and this causes the failure on exit. Does anybody have any ideas as to why this might be happening? Rick Sullivan Carbon Design Systems 125 Nagog Park Rd, Acton, MA 01720 O: +1 (978) 264-7370 | M: +1 (508) 479-3845 [cid:image001.jpg at 01CE7B16.AF255740][cid:image002.jpg at 01CE7B16.AF255740][cid:image003.jpg at 01CE7B16.AF255740][cid:image004.jpg at 01CE7B16.AF255740][cid:image005.jpg at 01CE7B16.AF255740][cid:image006.jpg at 01CE7B16.AF255740] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 768 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 976 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 731 bytes Desc: image003.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 709 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.jpg Type: image/jpeg Size: 757 bytes Desc: image005.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 728 bytes Desc: image006.jpg URL: From ofv at wanadoo.es Mon Jul 8 07:11:32 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Mon, 08 Jul 2013 16:11:32 +0200 Subject: [LLVMdev] Problem Using libLLVM-3.3.so References: <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFD6B@BE198.mail.lan> Message-ID: <87ip0lp2ej.fsf@wanadoo.es> Rick Sullivan writes: [snip] > The problem is this. For some simulations, the LLVM shared library > seems to take a segfault on exit. It runs correctly, but when the > simulator finishes, it crashes on exit. [snip] > > Does anybody have any ideas as to why this might be happening? Can you run the application under gdb and obtain a backtrace? Using a LLVM Debug build may (or may not) help on this endeavour. From rafael.espindola at gmail.com Mon Jul 8 08:07:14 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Mon, 8 Jul 2013 11:07:14 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: > Hmm, I don't know LLVM's Makefile system well enough to know the > easiest way to implement an option; if it's non-trivial then maybe > it's not worth it. That is my impression at least. These errors are somewhat easy to introduce, but also easy to fix. > I also don't know the workflow of most people doing out-of-tree work, > so I'm not sure how much impact this might have. It can obviously be > temporarily reverted locally pretty easily, but it assumes people are > paying attention to LLVMDev/llvm-commits and know what's going on. > Also, the commit causing all the new failures might not be as obvious > down the line to someone updating their tree irregularly (which is > probably true for a lot of academic LLVM users, I'm guessing.) > > Just curious, does using pipefail give any information about where in > the pipe the failure actually comes from? Some kind of message would > be useful for debugging purposes, in addition to explaining what's > going on to someone who wasn't watching dev lists and commit messages > carefully. With the external (shell based) testing we don't get anything. With the internal (python based) one we get the last failing command line. It doesn't say which component of it failed, but the vast majority of the pipes we have are of the form "command | FileCheck", so it is clear that "command" must be the one failing since the pipe was passing before enabling pipefail. Cheers, Rafael From tom at stellard.net Mon Jul 8 08:33:57 2013 From: tom at stellard.net (Tom Stellard) Date: Mon, 8 Jul 2013 08:33:57 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: <20130708153357.GC1764@L7-CNU1252LKR-172027226155.amd.com> Hi, I am in favor of using pipefail by default since it will reduce the potential for false positives in the test suite. On Mon, Jul 08, 2013 at 11:07:14AM -0400, Rafael Espíndola wrote: > > Hmm, I don't know LLVM's Makefile system well enough to know the > > easiest way to implement an option; if it's non-trivial then maybe > > it's not worth it. > > That is my impression at least. These errors are somewhat easy to > introduce, but also easy to fix. > FWIW, I don't think there is much value in adding a command line option for disabling pipefail. It shouldn't be too difficult for out of tree targets to fix their tests. -Tom > > I also don't know the workflow of most people doing out-of-tree work, > > so I'm not sure how much impact this might have. It can obviously be > > temporarily reverted locally pretty easily, but it assumes people are > > paying attention to LLVMDev/llvm-commits and know what's going on. > > Also, the commit causing all the new failures might not be as obvious > > down the line to someone updating their tree irregularly (which is > > probably true for a lot of academic LLVM users, I'm guessing.) > > > > Just curious, does using pipefail give any information about where in > > the pipe the failure actually comes from? Some kind of message would > > be useful for debugging purposes, in addition to explaining what's > > going on to someone who wasn't watching dev lists and commit messages > > carefully. > > With the external (shell based) testing we don't get anything. With > the internal (python based) one we get the last failing command line. > It doesn't say which component of it failed, but the vast majority of > the pipes we have are of the form "command | FileCheck", so it is > clear that "command" must be the one failing since the pipe was > passing before enabling pipefail. > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From neleai at seznam.cz Mon Jul 8 11:39:41 2013 From: neleai at seznam.cz (=?utf-8?B?T25kxZllaiBCw61sa2E=?=) Date: Mon, 8 Jul 2013 20:39:41 +0200 Subject: [LLVMdev] [RFC] Fix leading and trailing spaces Message-ID: <20130708183941.GA2610@domone.kolej.mff.cuni.cz> Hi, I am writing tool to simplify automated refactorings. One of prerequisites is have clean codebase, so a refactorer can be simple and created formatting inconsistencies can be eliminated by formatter. My plan to keep codebase clean is first run a cleanup systemwide, then keep it by hook/ periodicaly rerunning cleanup. I put it for now here. https://github.com/neleai/stylepp I ran a scripts that remove trailing whitespaces and fix leading spaces followed by tabs, commands are: script/stylepp_skeleton stylepp_trailing_space and script/stylepp_skeleton stylepp_space_after_tab on llvm codebase, rather large patches are here (Is this correct list or should I split them somewhat?) kam.mff.cuni.cz/~ondra/llvm_whitespace.patch kam.mff.cuni.cz/~ondra/clang_whitespace.patch kam.mff.cuni.cz/~ondra/compiler_rt_whitespace.patch I do not know if there are directories that I should keeps so could you tell me a list of them? Please use flags to ignore whitespaces like hg annotate -w git blame -w patch -l git apply --ignore-space Could you also verify each patch by git diff -w that it indeed touches only whitespaces? From bigcheesegs at gmail.com Mon Jul 8 11:43:19 2013 From: bigcheesegs at gmail.com (Michael Spencer) Date: Mon, 8 Jul 2013 11:43:19 -0700 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Sat, Jul 6, 2013 at 9:05 AM, Ahmed Bougacha wrote: > Hi all, > > A few days ago, there was a report of LLVM not compiling on VS 2008, > because of asymmetric std::lower_bound comparators not supported > there. > > As noted by a few people, maybe it's time to drop VS 2008 > compatibility and move the requirements to VS 2010? > > While there, what about going further and starting using C++11? Now > seems as good a time as ever; my takeaway from that few months old > discussion was that once 3.3 is released, it would be reasonable to > start using features supported by VS2010 / gcc-4.4 / clang-3.1. That > would be now, are there any objections left? > > -- Ahmed > I'm also in favor of dropping VS 2008 if nobody has a strong need for it, but C++11 is a different discussion. Does anyone know if we have any contributors that rely on VS 2008? - Michael Spencer -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Mon Jul 8 11:47:55 2013 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 8 Jul 2013 11:47:55 -0700 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Sat, Jul 6, 2013 at 9:59 AM, Aaron Ballman wrote: > I'm in favor of dropping VS 2008 support (in fact, I thought we had > already talked about doing that, but perhaps I am remembering > incorrectly). > +1 -- we're good with dropping VS 2008, especially if we document it clearly in the release notes. I think C++11 support should be a separate discussion than dropping VS > 2008 support because it's likely to be a bit more in-depth, but I'm in > favor of it. > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.kemper at intel.com Sun Jul 7 23:03:16 2013 From: benjamin.kemper at intel.com (Kemper, Benjamin) Date: Mon, 8 Jul 2013 06:03:16 +0000 Subject: [LLVMdev] Status of support in dwarf4 Message-ID: <54F82761435CF34F96720D1930FE6228018D6B4B@HASMSX103.ger.corp.intel.com> Hi, I've built the llvm-dwarfdump tool from the 3.3 branch and ran it on simple binary compiled with dwarf4 debug information. The tool seem to read the file correctly including dwarf information (DIContext::getDWARFContext returns a valid context), but when trying to access line information, all I get is "" string for the file name and 0's for line and column information. I've seen that there was a talk in the recent Europoe lldb-dev meeting but couldn't figure out the current status from the slides. Am I missing something or there still no support for parsing dwarf4 info? Thanks, Benjamin. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Mon Jul 8 12:45:47 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 8 Jul 2013 12:45:47 -0700 Subject: [LLVMdev] Status of support in dwarf4 In-Reply-To: <54F82761435CF34F96720D1930FE6228018D6B4B@HASMSX103.ger.corp.intel.com> References: <54F82761435CF34F96720D1930FE6228018D6B4B@HASMSX103.ger.corp.intel.com> Message-ID: > I've built the llvm-dwarfdump tool from the 3.3 branch and ran it on simple > binary compiled with dwarf4 debug information. > > > > The tool seem to read the file correctly including dwarf information > (DIContext::getDWARFContext returns a valid context), but when trying to > access line information, all I get is "" string for the file name > and 0's for line and column information. > > > > I've seen that there was a talk in the recent Europoe lldb-dev meeting but > couldn't figure out the current status from the slides. > > > > Am I missing something or there still no support for parsing dwarf4 info? > You're missing something. Basically there is some support for parsing it in llvm-dwarfdump, but that tool is largely used for testing the code emission of llvm and clang. If there's something missing/needed we've been adding it as we go along. Line information is one of those that needs to be added. llvm support of dwarf4 itself is pretty decent with just a few features of dwarf4 that aren't used. -eric > Thanks, > > Benjamin. > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From echristo at gmail.com Mon Jul 8 12:47:19 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 8 Jul 2013 12:47:19 -0700 Subject: [LLVMdev] Status of support in dwarf4 In-Reply-To: References: <54F82761435CF34F96720D1930FE6228018D6B4B@HASMSX103.ger.corp.intel.com> Message-ID: On Mon, Jul 8, 2013 at 12:45 PM, Eric Christopher wrote: >> I've built the llvm-dwarfdump tool from the 3.3 branch and ran it on simple >> binary compiled with dwarf4 debug information. >> >> >> >> The tool seem to read the file correctly including dwarf information >> (DIContext::getDWARFContext returns a valid context), but when trying to >> access line information, all I get is "" string for the file name >> and 0's for line and column information. >> >> >> >> I've seen that there was a talk in the recent Europoe lldb-dev meeting but >> couldn't figure out the current status from the slides. >> >> >> >> Am I missing something or there still no support for parsing dwarf4 info? >> > > You're missing something. Basically there is some support for parsing > it in llvm-dwarfdump, but that tool is largely used for testing the > code emission of llvm and clang. If there's something missing/needed > we've been adding it as we go along. Line information is one of those > that needs to be added. > Slight clarification: We can dump the line information we output, but I've definitely seen binaries output by gcc that we can't dump. I'm not sure what the issue is and haven't looked yet. -eric > llvm support of dwarf4 itself is pretty decent with just a few > features of dwarf4 that aren't used. > > -eric > >> Thanks, >> >> Benjamin. >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> From swlin at post.harvard.edu Mon Jul 8 15:10:24 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Mon, 8 Jul 2013 15:10:24 -0700 Subject: [LLVMdev] API break for out-of-tree targets implementing TargetLoweringBase::isFMAFasterThanMulAndAdd Message-ID: Hello, To any out-of-tree targets, please be aware that I intend to commit a patch that will break the build of any target implementing TargetLoweringBase::isFMAFasterThanMulAndAdd, for the reasons described below. (Basically, the current interface definition is broken and not followed, and no in-tree target was doing the right thing with it, so it is unlikely any out-of-tree target is either...) To un-break your build after this patch goes through, you will need to rename isFMAFasterThanMulAndAdd to isFMAFasterThanFMulAndFAdd and ensure that it returns true for any type that eventually legalizes to a type for which FMAs are faster than FMul + FAdd (which usually means you have hardware support of the operation.). You can look at in-tree target implementations as an example. Please let me know if there are any objections before tomorrow morning. Stephen ---------- Forwarded message ---------- From: Stephen Lin Date: Sun, Jul 7, 2013 at 9:25 PM Subject: [PATCH] Resolve issues with fmuladd intrinsic handling across multiple backends To: llvm-commits at cs.uiuc.edu Hi, While working on another patch, I discovered multiple related issues with fmuladd intrinsic handling which I believe the attached patch resolves. Currently, the operation depends on the target's implementation of the virtual function TargetLoweringBase::isFMAFasterThanMulAndAdd, the comments of which currently claims: - /// isFMAFasterThanMulAndAdd - Return true if an FMA operation is faster than - /// a pair of mul and add instructions. fmuladd intrinsics will be expanded to - /// FMAs when this method returns true (and FMAs are legal), otherwise fmuladd - /// is expanded to mul + add. The "and FMAs are legal" portion of the above comment is simply a lie; the legality of FMA operations is not checked before lowering fmuladd to ISD::FMA; however, the AArch64, SystemZ, and X86 implementations of this function are coded assuming that legality is checked and thus simply return true. This results in the following issues: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations (for example, compiling a call to "@llvm.fmuladd.v16f32(<16 x float> %a, <16 x float> %b, <16 x float> %c)" without "-mattr=+fma" or "-mattr=+fma4" results in 16 calls for the fmaf libm function instead of AVX muls and adds. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation rather than a software fp128 multiply and a software fp128 add. This does not seem to be the intended behavior given the comment above; however, I am not sure if this is actually better or worse, since neither set of operations is supported in hardware...does anyone know which is likely to be faster? However, on further investigation, it seems like not checking the legality of an FMA operation in lowering fmuladd intrinsics is a feature, not a bug, since it allows formation of FMAs with types like v16f32, as long as they legalize (via splitting, scalarization, promotion, etc.) to types that support FMAs; it turns out this case is explicitly tested for in test/CodeGen/X86/wide-fma-contraction.ll. So the proper solution, it seems, is for TargetLoweringInfo::isFMAFasterThanMulAndAdd implementations to check types and preconditions rather than depending on the caller to do so. The last in-tree target to implement this function, PowerPC, actually does check types, however, it only checks for the specific legal types, and therefore the following occurs: 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. This patch resolves all these issues by modifying the implementations of this virtual to check for subtarget features and for EVTs that legalize to MVTs that support hardware FMAs. (If I've understood the legalization rules correctly, then it turns out the latter can always be done in these targets just by checking the scalar type, but this is not the correct solution in the general case.) Comments are adjusted accordingly. (For the AArch64 backend in particular, I made the assumption, for now, that FMAs should not be formed on fp128, but this can be changed with a one-line fix.) Since all current implementations of this virtual were buggy and the comment describing its usage was incorrect, I've also decided to rename it from "TargetLoweringBase::isFMAFasterThanMulAndAdd" to "TargetLoweringBase::isFMAFasterThanFMulAndFAdd" to force a merge conflict for any out-of-tree target implementing it; the new name is more accurate and consistent with the name of other functions in this interface anyway*. I will send a message to LLVMDev describing the change and the appropriate fixes required if/when it goes through. (Suggestions for a better solution to this problem are welcome!) Also, I took this opportunity to modify DAGCombiner to only check for type legality when forming ISD::FMA nodes (in unsafe math mode) after legalization, since this is allowed by the new function interface definition (as long as isFMAFasterThanFMulAndFAdd returns true for the type). In practice with current target implementations this change seems to be a no-op, since the FMA will be formed after legalization anyway; however, there doesn't seem to be any harm in forming the FMA earlier if possible. No current tests are affected by this change; I've added tests that the specific symptoms described above are fixed as well as some other sanity checks, just in case. Please review and let me know if you have any comments! In particular, if anyone has an definite knowledge about what the correct behavior of fmuladd of fp128 on AArch64 should be, please let me know. Thanks, Stephen * fast integer FMA instructions are at least theoretically possible, I think (?); however, the current comment already indicates that the function is used specifically to lower fmuladd intrinsics so "FMul" and "FAdd" are more accurate. -------------- next part -------------- A non-text attachment was scrubbed... Name: fma-faster-check-types.patch Type: application/octet-stream Size: 26781 bytes Desc: not available URL: From clattner at apple.com Mon Jul 8 16:14:40 2013 From: clattner at apple.com (Chris Lattner) Date: Mon, 08 Jul 2013 16:14:40 -0700 Subject: [LLVMdev] [RFC] Fix leading and trailing spaces In-Reply-To: <20130708183941.GA2610@domone.kolej.mff.cuni.cz> References: <20130708183941.GA2610@domone.kolej.mff.cuni.cz> Message-ID: On Jul 8, 2013, at 11:39 AM, Ondřej Bílka wrote: > Hi, > > I am writing tool to simplify automated refactorings. One of > prerequisites is have clean codebase, so a refactorer can be simple and > created formatting inconsistencies can be eliminated by formatter. Cool. > My plan to keep codebase clean is first run a cleanup systemwide, then > keep it by hook/ periodicaly rerunning cleanup. Please don't do this. We don't like widespread changes like this, they make svn archeology more difficult. -Chris From gkistanova at gmail.com Mon Jul 8 16:46:13 2013 From: gkistanova at gmail.com (Galina Kistanova) Date: Mon, 8 Jul 2013 16:46:13 -0700 Subject: [LLVMdev] LLVM buildmaster will be restarted soon Message-ID: Hello everyone, I am going to update and restart LLVM buildmaster in a half hour. Thanks Galina -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at google.com Mon Jul 8 16:50:51 2013 From: eliben at google.com (Eli Bendersky) Date: Mon, 8 Jul 2013 16:50:51 -0700 Subject: [LLVMdev] Special cased global-to-local-in-main replacement in GlobalOpt Message-ID: Hello, GlobalOpt has an interesting special-case optimization for globals that are only accessed within "main". These globals are replaced by allocas within the "main" function (and the GV itself is deleted). The full condition for this happening is: // If this is a first class global and has only one accessing function // and this function is main (which we know is not recursive we can make // this global a local variable) we replace the global with a local alloca // in this function. // // NOTE: It doesn't make sense to promote non single-value types since we // are just replacing static memory to stack memory. // // If the global is in different address space, don't bring it to stack. if (!GS.HasMultipleAccessingFunctions && GS.AccessingFunction && !GS.HasNonInstructionUser && GV->getType()->getElementType()->isSingleValueType() && GS.AccessingFunction->getName() == "main" && GS.AccessingFunction->hasExternalLinkage() && GV->getType()->getAddressSpace() == 0) { >From today's discussion on IRC, there appear to be two problems with this approach: 1) The hard-coding of "main" to mean "entry point to the code" that only dynamically runs once. 2) Assuming that "main" cannot be recursive (in the general sense). (1) is a problem for non-traditional compilation flows such as simply JIT of freestanding code where "main" is not the entry point; another case is PNaCl, where "main" is not the entry point ("_start" is), and others where parts of the runtime environment are included in the IR together with the user code. This is not the only place where the name "main" is hard-coded within the LLVM code base, but it's a good example. (2) is a problem because the C standard, unlike the C++ standard, says nothing about "main" not being recursive. C++11 says in 3.6.1: "The function main shall not be used within a program.". C does not appear to mention such a restriction, which may make the optimization invalid for C. A number of possible solutions were raised: some sort of function attribute that marks an entry point, module-level entry point, documenting that LLVM assumes that the entry point is always renamed to "main", etc. These mostly address (1) but not (2). Any thoughts and suggestions are welcome. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From shining.llvm at gmail.com Mon Jul 8 23:57:23 2013 From: shining.llvm at gmail.com (ning Shi) Date: Tue, 9 Jul 2013 14:57:23 +0800 Subject: [LLVMdev] Kaleidoscope Tutorial is Out of Date In-Reply-To: References: Message-ID: The new version of doc has changed that. http://llvm.org/docs/tutorial/LangImpl4.html 2013/7/5 Andy Jost > I’m working thought the Kaleidoscope tutorials for LLVM 3.3 and noticed > the code listing for chapter 4 is out of date on the web.**** > > ** ** > > Take a look at http://llvm.org/releases/3.3/docs/tutorial/LangImpl4.html. > The first line includes llvm/DerivedTypes.h, but this does not compile. > The correct path is llvm/IR/Derivedtypes.h. There are other differences, > and the examples that I pulled along with the source are correct.**** > > ** ** > > -Andy**** > > ** ** > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neleai at seznam.cz Mon Jul 8 23:05:24 2013 From: neleai at seznam.cz (=?utf-8?B?T25kxZllaiBCw61sa2E=?=) Date: Tue, 9 Jul 2013 08:05:24 +0200 Subject: [LLVMdev] [RFC] Fix leading and trailing spaces In-Reply-To: References: <20130708183941.GA2610@domone.kolej.mff.cuni.cz> Message-ID: <20130709060524.GA5650@domone.kolej.mff.cuni.cz> On Mon, Jul 08, 2013 at 04:14:40PM -0700, Chris Lattner wrote: > > On Jul 8, 2013, at 11:39 AM, Ondřej Bílka wrote: > > > Hi, > > > > I am writing tool to simplify automated refactorings. One of > > prerequisites is have clean codebase, so a refactorer can be simple and > > created formatting inconsistencies can be eliminated by formatter. > > Cool. > > > My plan to keep codebase clean is first run a cleanup systemwide, then > > keep it by hook/ periodicaly rerunning cleanup. > > Please don't do this. We don't like widespread changes like this, they make svn archeology more difficult. > Please use git. It has many features that svn lacks git blame -w is one of those. Also this change is not because these spaces are inconsistent but because they cause problems. When you do a diff with some spaces followed by tabs then formatting can get weird because tab shifted. Also with trailing spaces and long line it will wrap to new line which will look empty. But as I looked most of lines that I changed are empty anyway so we do not lose much of history. From triple.yang at gmail.com Tue Jul 9 04:56:10 2013 From: triple.yang at gmail.com (=?UTF-8?B?5p2o5YuH5YuH?=) Date: Tue, 9 Jul 2013 19:56:10 +0800 Subject: [LLVMdev] A problem on returning value for functions Message-ID: Hi, I write a backend and come cross an abnormal problem. Here I give a example to describe it: /////////////////////////////////////////////////////////////////////////////////////////// // A simple C function int foo() { return 1234; } ///////////////////////////////////////////////////////////////////////////////////////// When compiling foo() into my target ISA, I would expect codes like: ///////////////////////////////////////////////////////////////////////////////////////// ... movi r0, #1234 // prepare r0 to return value 1234. ... ret // return to caller. //////////////////////////////////////////////////////////////////////////////////////// The headache is when I pass option -O0 to llc, the generated codes are correct. However, if I omit -O0 and use default compiling options, the instruction "movi r0, #1234" does not show. I have already checked DAGs using options -view-xxxxx-dags and they all seem OK. I do not have any clue right now. Please help me. Thank you. -- 杨勇勇 (Yang Yong-Yong) -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Tue Jul 9 05:07:49 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Tue, 9 Jul 2013 13:07:49 +0100 Subject: [LLVMdev] A problem on returning value for functions In-Reply-To: References: Message-ID: Hi, > The headache is when I pass option -O0 to llc, the generated codes are > correct. However, if I omit -O0 and use default compiling options, the > instruction "movi r0, #1234" does not show. It's probably being eliminated as dead code. You want to make sure that during ISelLowering your RET instruction has %R0 as one of its operands (check in the -view-isel-dags step). That's the most likely cause anyway. If not, posting the DAG might help, or the output of "llc -debug" on an equivalent .ll file. Cheers. Tim. From ricks at carbondesignsystems.com Tue Jul 9 05:17:59 2013 From: ricks at carbondesignsystems.com (Rick Sullivan) Date: Tue, 9 Jul 2013 08:17:59 -0400 Subject: [LLVMdev] Problem Using libLLVM-3.3.so In-Reply-To: <87ip0lp2ej.fsf@wanadoo.es> References: <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFD6B@BE198.mail.lan> <87ip0lp2ej.fsf@wanadoo.es> Message-ID: <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFDA1@BE198.mail.lan> Unfortunately, I haven't been able to get the failure to occur in gdb. Our crash handler generates a back trace, but it doesn't supply much information: 0: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler12GetBacktraceEPc+0x2b) [0x8209adb] CrashHandler::GetBacktrace(char*) 1: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler14GenerateReportEv+0x204) [0x820a434] CrashHandler::GenerateReport() 2: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler21DoAllReportGenerationEv+0x1c) [0x820a51c] CrashHandler::DoAllReportGeneration() 3: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler9GotSignalEv+0x6a6) [0x820ac86] CrashHandler::GotSignal() 4: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_Z14CSignalHandleri+0x6c) [0x820b05c] CSignalHandler(int) 5: /lib/tls/libc.so.6 [0xc249b8] I've also tried running simulations without using libLLVM-3.3.so, but instead statically linking the required LLVM .a libraries into the component shared objects. This is not an idea solution, because it effectively bundles LLVM into each component, more than doubling the size of each component on disk. However, this also produces a crash with a more meaningful stack trace. Whether this is related to the failures I'm seeing with the LLVM shared object - I don't know. Here is the valgrind report: ==15093== Jump to the invalid address stated on the next line ==15093== at 0x0: ??? ==15093== by 0x285C2321: llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~parser() (CommandLine.h:629) ==15093== by 0x285C23D9: llvm::cl::opt<(anonymous namespace)::DefaultOnOff, false, llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~opt() (CommandLine.h:1 ==15093== by 0x285A3BD1: __tcf_5 (DwarfDebug.cpp:85) ==15093== by 0x441867: __cxa_finalize (in /lib/tls/libc-2.3.4.so) ==15093== by 0x282212F2: (within libCORTEXA9MP.mx_DBG.so) ==15093== by 0x28EEBB05: (within libCORTEXA9MP.mx_DBG.so) ==15093== by 0x518C41: _dl_close (in /lib/tls/libc-2.3.4.so) ==15093== by 0x56DD59: dlclose_doit (in /lib/libdl-2.3.4.so) ==15093== by 0x40966D: _dl_catch_error (in /lib/ld-2.3.4.so) ==15093== by 0x56E2BA: _dlerror_run (in /lib/libdl-2.3.4.so) ==15093== by 0x56DD89: dlclose (in /lib/libdl-2.3.4.so) -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Óscar Fuentes Sent: Monday, July 08, 2013 10:12 AM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Problem Using libLLVM-3.3.so Rick Sullivan writes: [snip] > The problem is this. For some simulations, the LLVM shared library > seems to take a segfault on exit. It runs correctly, but when the > simulator finishes, it crashes on exit. [snip] > > Does anybody have any ideas as to why this might be happening? Can you run the application under gdb and obtain a backtrace? Using a LLVM Debug build may (or may not) help on this endeavour. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From cristall at eleveneng.com Tue Jul 9 08:01:15 2013 From: cristall at eleveneng.com (Sam Cristall) Date: Tue, 09 Jul 2013 09:01:15 -0600 Subject: [LLVMdev] EVT::isRound on non-8-bit byte targets Message-ID: <51DC25BB.5070701@eleveneng.com> I'm new to LLVM dev, but I have been working with a target with a minimum addressable byte of 16-bits. I found that in DAGCombiner::visitAND, EVT::isRound could create i8 loads on my 16-bit target which are ultimately invalid. EVT::isRound appears to use a hard-coded 8, rather than pulling the targets BitsPerByte field. Is this a potential bug or is there a better way to address this? Hard coding a 16 in the isRound field fixes the issue for me. Cheers, Sam From triple.yang at gmail.com Tue Jul 9 08:17:06 2013 From: triple.yang at gmail.com (=?UTF-8?B?5p2o5YuH5YuH?=) Date: Tue, 9 Jul 2013 23:17:06 +0800 Subject: [LLVMdev] A problem on returning value for functions In-Reply-To: References: Message-ID: Thank you, this is very instructive. I soon realized I forgot to add SDNPVariadic in my node definition of return operator. And thus even though the LowerReturn() is implemented properly, the instructions for passing return value is eliminated. Regards. 2013/7/9 Tim Northover > Hi, > > > The headache is when I pass option -O0 to llc, the generated codes are > > correct. However, if I omit -O0 and use default compiling options, the > > instruction "movi r0, #1234" does not show. > > It's probably being eliminated as dead code. You want to make sure > that during ISelLowering your RET instruction has %R0 as one of its > operands (check in the -view-isel-dags step). > > That's the most likely cause anyway. If not, posting the DAG might > help, or the output of "llc -debug" on an equivalent .ll file. > > Cheers. > > Tim. > -- 杨勇勇 (Yang Yong-Yong) -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Tue Jul 9 08:29:15 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 09 Jul 2013 17:29:15 +0200 Subject: [LLVMdev] EVT::isRound on non-8-bit byte targets In-Reply-To: <51DC25BB.5070701@eleveneng.com> References: <51DC25BB.5070701@eleveneng.com> Message-ID: <51DC2C4B.1000103@free.fr> Hi Sam, On 09/07/13 17:01, Sam Cristall wrote: > I'm new to LLVM dev, but I have been working with a target with a > minimum addressable byte of 16-bits. I found that in > DAGCombiner::visitAND, EVT::isRound could create i8 loads on my 16-bit > target which are ultimately invalid. EVT::isRound appears to use a > hard-coded 8, rather than pulling the targets BitsPerByte field. Is this > a potential bug or is there a better way to address this? Hard coding a > 16 in the isRound field fixes the issue for me. last time I checked LLVM did not have a BitsPerByte field. Are you working with a private copy of LLVM where someone has added support for non-octet bytes? If so, I guess they forgot to adjust isRound. Ciao, Duncan. From cristall at eleveneng.com Tue Jul 9 08:39:13 2013 From: cristall at eleveneng.com (Sam Cristall) Date: Tue, 09 Jul 2013 09:39:13 -0600 Subject: [LLVMdev] EVT::isRound on non-8-bit byte targets In-Reply-To: <51DC2C4B.1000103@free.fr> References: <51DC2C4B.1000103@free.fr> Message-ID: <51DC2EA1.8020507@eleveneng.com> Hi Duncan, It appears you are correct -- I didn't realize this was a wart of my fork, thank you for your time! Cheers, Sam From westdac at gmail.com Tue Jul 9 10:17:08 2013 From: westdac at gmail.com (Dan) Date: Tue, 9 Jul 2013 11:17:08 -0600 Subject: [LLVMdev] Optimization issue for target's offset field of load operation in DAGSelection Message-ID: I am working on an experimental target and trying to make sure that the load offset field is used to the best way. There appears to be some control over the architecture's offset range and whether the offset is too large and needs to be lowered/converted into a separate sequence of operations in DAGSelection? Can someone point me to what might be the case? For example, the difference between index=63 and 64 causes the difference in address+offset being generated as separate operation versus built into the architecture versus just a load operation. In my architecture, there are larger offsets and 63 and 64 should not be the dividing line. Is there a limit on the ranges specified effectively for all targets or is somehow a constraint for my target set and causing this? Suggestions? long long array[100]; long long func() { return array[63]; // return array[64]; } Here is the difference in the .ll code with the 63 or 64 as the index: %0 = load i64* getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 63), align 8 ret i64 %0 %0 = load i64* getelementptr inbounds ([10000 x i64]* @array, i32 0, i64 64), align 8 ret i64 %0 Here is the Instruction Selection for size 63: ISEL: Starting pattern match on root node: 0x3d9ad80: i64,ch = load 0x3d866f8, 0x3d9aa80, 0x3d9ac80 [ORD=2] [ID=6] Initial Opcode index to 813 Skipped scope entry (due to false predicate) at index 822, continuing at 876 Skipped scope entry (due to false predicate) at index 877, continuing at 931 TypeSwitch[i64] from 934 to 937 Morphed node: 0x3d9ad80: i64,ch = LDWri 0x3d9a880, 0x3d9ab80, 0x3d866f8 [ORD=2] ISEL: Match complete! ===== Instruction selection ends: Here is the Instruction Selection for size 64: ISEL: Match complete! ISEL: Starting pattern match on root node: 0x2d2cda0: i64,ch = load 0x2d18718, 0x2d2caa0, 0x2d2cca0 [ORD=2] [ID=6] Initial Opcode index to 813 Skipped scope entry (due to false predicate) at index 822, continuing at 876 Skipped scope entry (due to false predicate) at index 877, continuing at 931 TypeSwitch[i64] from 934 to 937 Morphed node: 0x2d2cda0: i64,ch = LDWri 0x2d2caa0, 0x2d2cba0, 0x2d18718 [ORD=2] ISEL: Match complete! ISEL: Starting pattern match on root node: 0x2d2caa0: i64 = add 0x2d2c8a0, 0x2d2c9a0 [ORD=1] [ID=5] Initial Opcode index to 1473 Match failed at index 1482 Continuing at 1498 Morphed node: 0x2d2caa0: i64 = ADD 0x2d2c8a0, 0x2d2c9a0 [ORD=1] etc From esotericmoment at live.com Tue Jul 9 10:35:35 2013 From: esotericmoment at live.com (Clay J) Date: Tue, 9 Jul 2013 13:35:35 -0400 Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph extraction Message-ID: I am currently attempting to learn how to use LLVM for control flow graph extraction on linux (Ubuntu). Basically, I need to be able to break down specific basic functions blocks from assembly code, and use it to make a CFG. Do any of you upstanding human beings have any knowledge or resources that could possibly assist me in this task? I apologize if this is a very basic question. I have already installed the proper files/programs. Thank you in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nlewycky at google.com Tue Jul 9 10:39:19 2013 From: nlewycky at google.com (Nick Lewycky) Date: Tue, 9 Jul 2013 10:39:19 -0700 Subject: [LLVMdev] llvm bay-area social in july In-Reply-To: References: Message-ID: On 24 June 2013 18:58, Nick Lewycky wrote: > Hi everyone! Since the first Thursday in July lands on July 4th, I'll need > to move the date. The new plan is to meet at the usual spot, but on the > following Tuesday, July 9th. > Reminder: the social is today! (Tuesday!? We had to skip Thursday July 4th, and this was the next day I expect everyone to be back from holiday vacations.) Please let us know if you'll be coming at: http://llvmbayarea.appspot.com/ > > Nick > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Tue Jul 9 10:46:55 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Tue, 09 Jul 2013 12:46:55 -0500 Subject: [LLVMdev] Optimization issue for target's offset field of load operation in DAGSelection In-Reply-To: References: Message-ID: <51DC4C8F.2030304@codeaurora.org> On 7/9/2013 12:17 PM, Dan wrote: > I am working on an experimental target and trying to make sure that > the load offset field is used to the best way. There appears to be > some control over the architecture's offset range and whether the > offset is too large and needs to be lowered/converted into a separate > sequence of operations in DAGSelection? > > Can someone point me to what might be the case? Instruction patterns can have predicates on each operand to make sure that the operand meets the required criteria. For example, in lib/Target/PowerPC/PPCInstrInfo.td, there is a definition of ADDI: def ADDI : DForm_2<14, (outs gprc:$rD), (ins gprc_nor0:$rA, symbolLo:$imm), "addi $rD, $rA, $imm", IntSimple, [(set i32:$rD, (add i32:$rA, immSExt16:$imm))]>; The "immSExt16" is a predicate, and it's defined in the same file: def immSExt16 : PatLeaf<(imm), [{ // immSExt16 predicate - True if the immediate fits in a 16-bit // sign extended field. Used by instructions like 'addi'. if (N->getValueType(0) == MVT::i32) return (int32_t)N->getZExtValue() == (short)N->getZExtValue(); else return (int64_t)N->getZExtValue() == (short)N->getZExtValue(); }]>; In this case, the ADDI will be generated only if the immediate operand satisfies the predicate. Otherwise, the ADDI pattern won't match, and the instruction selector will attempt to match other patterns. In this case it will most likely be the immediate by itself (loaded into a register), and then the pattern for ADD (register+register) will match on the result. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From swlin at post.harvard.edu Tue Jul 9 11:24:02 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Tue, 9 Jul 2013 11:24:02 -0700 Subject: [LLVMdev] API break for out-of-tree targets implementing TargetLoweringBase::isFMAFasterThanMulAndAdd In-Reply-To: References: Message-ID: OK, the patch is committed as r185956, and is guaranteed to break either the merge or build of any out-of-tree target implementing TargetLoweringBase::isFMAFasterThanMulAndAdd. To resolve, rename isFMAFasterThanMulAndAdd to isFMAFasterThanFMulAndFAdd and ensure that it returns true for any type that eventually legalizes to a type for which FMAs are faster than FMul + FAdd. For all in-tree targets, this simply requires checking any subtarget prerequisites (if any) and then checking the scalar type of the EVT passed to the function...however, it may be more complicated in other situations (for example, if FMAs are supported on a scalar type but not on a legal vectorized type, and the fmul + fadd on the vector is faster than scalarizing and using the scalar FMA...) Please let me know if there any questions or concerns. Stephen On Mon, Jul 8, 2013 at 3:10 PM, Stephen Lin wrote: > Hello, > > To any out-of-tree targets, please be aware that I intend to commit a > patch that will break the build of any target implementing > TargetLoweringBase::isFMAFasterThanMulAndAdd, for the reasons > described below. (Basically, the current interface definition is > broken and not followed, and no in-tree target was doing the right > thing with it, so it is unlikely any out-of-tree target is either...) > > To un-break your build after this patch goes through, you will need to > rename isFMAFasterThanMulAndAdd to isFMAFasterThanFMulAndFAdd and > ensure that it returns true for any type that eventually legalizes to > a type for which FMAs are faster than FMul + FAdd (which usually means > you have hardware support of the operation.). You can look at in-tree > target implementations as an example. > > Please let me know if there are any objections before tomorrow morning. > > Stephen > > ---------- Forwarded message ---------- > From: Stephen Lin > Date: Sun, Jul 7, 2013 at 9:25 PM > Subject: [PATCH] Resolve issues with fmuladd intrinsic handling across > multiple backends > To: llvm-commits at cs.uiuc.edu > > > Hi, > > While working on another patch, I discovered multiple related issues > with fmuladd intrinsic handling which I believe the attached patch > resolves. > > Currently, the operation depends on the target's implementation of the > virtual function TargetLoweringBase::isFMAFasterThanMulAndAdd, the > comments of which currently claims: > > - /// isFMAFasterThanMulAndAdd - Return true if an FMA operation is faster than > - /// a pair of mul and add instructions. fmuladd intrinsics will be > expanded to > - /// FMAs when this method returns true (and FMAs are legal), > otherwise fmuladd > - /// is expanded to mul + add. > > The "and FMAs are legal" portion of the above comment is simply a lie; > the legality of FMA operations is not checked before lowering fmuladd > to ISD::FMA; however, the AArch64, SystemZ, and X86 implementations of > this function are coded assuming that legality is checked and thus > simply return true. This results in the following issues: > > 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering > fmuladd intrinsics even if the subtarget does not support FMA > instructions, leading to laughably bad code generation in some > situations (for example, compiling a call to "@llvm.fmuladd.v16f32(<16 > x float> %a, <16 x float> %b, <16 x float> %c)" without "-mattr=+fma" > or "-mattr=+fma4" results in 16 calls for the fmaf libm function > instead of AVX muls and adds. > > 2. On AArch64 targets, ISD::FMA nodes are formed for operations on > fp128, resulting in a call to a software fp128 FMA implementation > rather than a software fp128 multiply and a software fp128 add. This > does not seem to be the intended behavior given the comment above; > however, I am not sure if this is actually better or worse, since > neither set of operations is supported in hardware...does anyone know > which is likely to be faster? > > However, on further investigation, it seems like not checking the > legality of an FMA operation in lowering fmuladd intrinsics is a > feature, not a bug, since it allows formation of FMAs with types like > v16f32, as long as they legalize (via splitting, scalarization, > promotion, etc.) to types that support FMAs; it turns out this case is > explicitly tested for in test/CodeGen/X86/wide-fma-contraction.ll. > > So the proper solution, it seems, is for > TargetLoweringInfo::isFMAFasterThanMulAndAdd implementations to check > types and preconditions rather than depending on the caller to do so. > The last in-tree target to implement this function, PowerPC, actually > does check types, however, it only checks for the specific legal > types, and therefore the following occurs: > > 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics > on types like v2f32, v8f32, v4f64, etc., even though they promote, > split, scalarize, etc. to types that support hardware FMAs. > > This patch resolves all these issues by modifying the implementations > of this virtual to check for subtarget features and for EVTs that > legalize to MVTs that support hardware FMAs. (If I've understood the > legalization rules correctly, then it turns out the latter can always > be done in these targets just by checking the scalar type, but this is > not the correct solution in the general case.) > Comments are adjusted accordingly. > > (For the AArch64 backend in particular, I made the assumption, for > now, that FMAs should not be formed on fp128, but this can be changed > with a one-line fix.) > > Since all current implementations of this virtual were buggy and the > comment describing its usage was incorrect, I've also decided to > rename it from "TargetLoweringBase::isFMAFasterThanMulAndAdd" to > "TargetLoweringBase::isFMAFasterThanFMulAndFAdd" to force a merge > conflict for any out-of-tree target implementing it; the new name is > more accurate and consistent with the name of other functions in this > interface anyway*. I will send a message to LLVMDev describing the > change and the appropriate fixes required if/when it goes through. > (Suggestions for a better solution to this problem are welcome!) > > Also, I took this opportunity to modify DAGCombiner to only check for > type legality when forming ISD::FMA nodes (in unsafe math mode) after > legalization, since this is allowed by the new function interface > definition (as long as isFMAFasterThanFMulAndFAdd returns true for the > type). In practice with current target implementations this change > seems to be a no-op, since the FMA will be formed after legalization > anyway; however, there doesn't seem to be any harm in forming the FMA > earlier if possible. > > No current tests are affected by this change; I've added tests that > the specific symptoms described above are fixed as well as some other > sanity checks, just in case. > > Please review and let me know if you have any comments! In particular, > if anyone has an definite knowledge about what the correct behavior of > fmuladd of fp128 on AArch64 should be, please let me know. > > Thanks, > Stephen > > * fast integer FMA instructions are at least theoretically possible, I > think (?); however, the current comment already indicates that the > function is used specifically to lower fmuladd intrinsics so "FMul" > and "FAdd" are more accurate. From hfinkel at anl.gov Tue Jul 9 12:08:11 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Tue, 9 Jul 2013 14:08:11 -0500 (CDT) Subject: [LLVMdev] Script for stressing llc In-Reply-To: <1846814652.10788863.1373396542421.JavaMail.root@alcf.anl.gov> Message-ID: <1718947528.10790691.1373396891368.JavaMail.root@alcf.anl.gov> Hi, I wrote a small script in order to stress test llc using test cases generated by llvm-stress. When it finds a case where llc seems to have crashed, it greps the output for Assertion, LLVM ERROR, etc., removes things that look like hex numbers and ID numbers, and then checksums the resulting text. In this way, it can automatically categorize different bugs into different subdirectories. I found this useful, and maybe you will too :) -Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: stress.sh Type: application/x-shellscript Size: 846 bytes Desc: not available URL: From micah.villmow at smachines.com Tue Jul 9 13:18:47 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Tue, 9 Jul 2013 20:18:47 +0000 Subject: [LLVMdev] llvm bay-area social in july In-Reply-To: References: Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE60070787FE@smi-exchange1.smi.local> I think the database got corrupted again, only shows 4 people showing up, but there were quite a few more last week. Micah From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Lewycky Sent: Tuesday, July 09, 2013 10:39 AM To: LLVM Developers Mailing List; clang-dev Developers Subject: Re: [LLVMdev] llvm bay-area social in july On 24 June 2013 18:58, Nick Lewycky > wrote: Hi everyone! Since the first Thursday in July lands on July 4th, I'll need to move the date. The new plan is to meet at the usual spot, but on the following Tuesday, July 9th. Reminder: the social is today! (Tuesday!? We had to skip Thursday July 4th, and this was the next day I expect everyone to be back from holiday vacations.) Please let us know if you'll be coming at: http://llvmbayarea.appspot.com/ Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From nlewycky at google.com Tue Jul 9 13:26:24 2013 From: nlewycky at google.com (Nick Lewycky) Date: Tue, 9 Jul 2013 13:26:24 -0700 Subject: [LLVMdev] llvm bay-area social in july In-Reply-To: <3947CD34E13C4F4AB2D94AD35AE3FE60070787FE@smi-exchange1.smi.local> References: <3947CD34E13C4F4AB2D94AD35AE3FE60070787FE@smi-exchange1.smi.local> Message-ID: On 9 July 2013 13:18, Micah Villmow wrote: > I think the database got corrupted again, only shows 4 people showing > up, but there were quite a few more last week. > Thanks. I added a memcache layer after it started getting a very large number of hits from a spammer (they can't post due to the captcha but that doesn't stop them from loading the page so many times it puts us out of database read quota!), and apparently there's a bug somewhere in there. I'll try to get it fixed before next month, but for now I can click a button and everybody's posts reappear. Nick PS. Micah, I saw a duplicate entry for you and chose to delete it. Just wanted you to know that that part isn't a glitch in the recovery process or something. > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Nick Lewycky > *Sent:* Tuesday, July 09, 2013 10:39 AM > > *To:* LLVM Developers Mailing List; clang-dev Developers > *Subject:* Re: [LLVMdev] llvm bay-area social in july**** > > ** ** > > On 24 June 2013 18:58, Nick Lewycky wrote:**** > > Hi everyone! Since the first Thursday in July lands on July 4th, I'll need > to move the date. The new plan is to meet at the usual spot, but on the > following Tuesday, July 9th.**** > > ** ** > > Reminder: the social is today! (Tuesday!? We had to skip Thursday July > 4th, and this was the next day I expect everyone to be back from holiday > vacations.)**** > > Please let us know if you'll be coming at: > http://llvmbayarea.appspot.com/**** > > ** ** > > Nick**** > > ** ** > > ** ** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ofv at wanadoo.es Tue Jul 9 13:29:36 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Tue, 09 Jul 2013 22:29:36 +0200 Subject: [LLVMdev] Problem Using libLLVM-3.3.so References: <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFD6B@BE198.mail.lan> <87ip0lp2ej.fsf@wanadoo.es> <3E81A568A9EB894A9BD6AE7CDD3AA903478A2DFDA1@BE198.mail.lan> Message-ID: <87ehb7pjdb.fsf@wanadoo.es> Rick Sullivan writes: > I've also tried running simulations without using libLLVM-3.3.so, but > instead statically linking the required LLVM .a libraries into the > component shared objects. This is not an idea solution, because it > effectively bundles LLVM into each component, more than doubling the > size of each component on disk. However, this also produces a crash > with a more meaningful stack trace. Whether this is related to the > failures I'm seeing with the LLVM shared object - I don't know. Here > is the valgrind report: > > ==15093== Jump to the invalid address stated on the next line > ==15093== at 0x0: ??? > ==15093== by 0x285C2321: llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~parser() (CommandLine.h:629) > ==15093== by 0x285C23D9: llvm::cl::opt<(anonymous namespace)::DefaultOnOff, false, llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~opt() (CommandLine.h:1 > ==15093== by 0x285A3BD1: __tcf_5 (DwarfDebug.cpp:85) > ==15093== by 0x441867: __cxa_finalize (in /lib/tls/libc-2.3.4.so) > ==15093== by 0x282212F2: (within libCORTEXA9MP.mx_DBG.so) > ==15093== by 0x28EEBB05: (within libCORTEXA9MP.mx_DBG.so) > ==15093== by 0x518C41: _dl_close (in /lib/tls/libc-2.3.4.so) > ==15093== by 0x56DD59: dlclose_doit (in /lib/libdl-2.3.4.so) > ==15093== by 0x40966D: _dl_catch_error (in /lib/ld-2.3.4.so) > ==15093== by 0x56E2BA: _dlerror_run (in /lib/libdl-2.3.4.so) > ==15093== by 0x56DD89: dlclose (in /lib/libdl-2.3.4.so) It is crashing while destroying a static instance of cl::opt. The `this' pointer is null. I'll say memory corruption. Does Valgrind report something about that? LLVM has a lot of shared state. If you load several shared libraries, each containing its own copy of LLVM, and they interact one with each other, bad things may happen. Also, check that you are initializing LLVM on a sane way, and that you are terminating it the right way too (make sure that the LLVM objects you destroy are owned by you, and those that you don't are owned by LLVM.) Try to bisect the problematic area by removing functionality from your code. If you can replicate the crash by loading just one shared library that uses LLVM plus the valgrind report contains nothing about memory corruption, file a bug report. From micah.villmow at smachines.com Tue Jul 9 14:06:19 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Tue, 9 Jul 2013 21:06:19 +0000 Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph extraction In-Reply-To: References: Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE600707889C@smi-exchange1.smi.local> This isn't by itself too difficult, as I have done something similar recently, but does require some modifications of LLVM. The basic algorithm is simple: For each ISA instruction, create a new MachineInstr and add it to the current MachineBasicBlock. At each branch instruction, add it to the current MBB and add it to a list and create a new MBB. After creating your list of MBB, iterate through them and reconnect the successors based on branches and fall throughs. The problem is that what you are producing has no connection to the IR, and there are parts of LLVM that expect that link, specifically the printing/CFG dumping functions. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Clay J Sent: Tuesday, July 09, 2013 10:36 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph extraction I am currently attempting to learn how to use LLVM for control flow graph extraction on linux (Ubuntu). Basically, I need to be able to break down specific basic functions blocks from assembly code, and use it to make a CFG. Do any of you upstanding human beings have any knowledge or resources that could possibly assist me in this task? I apologize if this is a very basic question. I have already installed the proper files/programs. Thank you in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrew.Jost at synopsys.com Tue Jul 9 14:43:32 2013 From: Andrew.Jost at synopsys.com (Andy Jost) Date: Tue, 9 Jul 2013 21:43:32 +0000 Subject: [LLVMdev] Error building compiler-rt Message-ID: Hi, I get the following error while building compiler-rt: /slowfs/msret_s1_us03/ajost/src/llvm-3.3.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux.cc:315:22: error: no matching function for call to 'clone' pid_t tracer_pid = clone(TracerThread, tracer_stack.Bottom(), ^~~~~ /usr/include/bits/sched.h:71:12: note: candidate function not viable: requires 4 arguments, but 7 were provided extern int clone (int (*__fn) (void *__arg), void *__child_stack, Inside sched.h, clone is indeed declared with four arguments, but, interestingly, the man page for clone provides this prototype: #include int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ ); I'm running RedHat EL 2.6.9-89.ELlargesmp without root privileges. Is this a bug in LLVM? Do I just have an old version of clone that's not supported by LLVM? I can try just removing the last three arguments from the compiler-rt source, but is that the best solution? If someone can point out a clean way to fix this, then I don't mind trying to contribute a patch (I would need to learn how). Also, is this something that autoconf should have detected? What should it have done about it? -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrew.Jost at synopsys.com Tue Jul 9 15:00:08 2013 From: Andrew.Jost at synopsys.com (Andy Jost) Date: Tue, 9 Jul 2013 22:00:08 +0000 Subject: [LLVMdev] Error building compiler-rt In-Reply-To: References: Message-ID: Ok, after familiarizing myself with clone it appears to me this is a bug in compiler-rt. >From the clone man page: In Linux 2.4 and earlier, clone() does not take arguments ptid, tls, and ctid. The source file passes those arguments without any fencing to check the Linux version. Also, ptid, tls, and ctid are only used in conjunction with certain flags (e.g., CLONE_PARENT_SETTID), but none of those flags are set. It looks like the fix (for all Linux versions) would be to simply remove the last three arguments from the call. -Andy From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Andy Jost Sent: Tuesday, July 09, 2013 2:44 PM To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Error building compiler-rt Hi, I get the following error while building compiler-rt: /slowfs/msret_s1_us03/ajost/src/llvm-3.3.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux.cc:315:22: error: no matching function for call to 'clone' pid_t tracer_pid = clone(TracerThread, tracer_stack.Bottom(), ^~~~~ /usr/include/bits/sched.h:71:12: note: candidate function not viable: requires 4 arguments, but 7 were provided extern int clone (int (*__fn) (void *__arg), void *__child_stack, Inside sched.h, clone is indeed declared with four arguments, but, interestingly, the man page for clone provides this prototype: #include int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ ); I'm running RedHat EL 2.6.9-89.ELlargesmp without root privileges. Is this a bug in LLVM? Do I just have an old version of clone that's not supported by LLVM? I can try just removing the last three arguments from the compiler-rt source, but is that the best solution? If someone can point out a clean way to fix this, then I don't mind trying to contribute a patch (I would need to learn how). Also, is this something that autoconf should have detected? What should it have done about it? -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundeepk at codeaurora.org Tue Jul 9 15:00:44 2013 From: sundeepk at codeaurora.org (sundeepk at codeaurora.org) Date: Tue, 9 Jul 2013 22:00:44 -0000 Subject: [LLVMdev] Floating point ordered and unordered comparisons Message-ID: <5236cf16e93285734b1b29a85c981727.squirrel@www.codeaurora.org> Hi All, I noticed LLVM target independent side is converting an ordered less than "setolt" into unordered greater than "setuge" operation. There are no target hooks to control going from the ordered mode into unordered. I am trying to figure out the best way to support unordered operation on Hexagon. We don't have a single instruction to do unordered operation. So we will have to break it down into 2 instructions - check if unordered followed by the actual operation. I looked at X86 and ARM and it seems like both targes support unordered comparisons. I would prefer target independent part not to transform ordered ops into unordered. Is it a good idea? How do other targets support this feature? I don't have a lot of experience dealing with floating points. I will really appreciate any help here. Thanks, Sundeep From garious at gmail.com Tue Jul 9 15:11:03 2013 From: garious at gmail.com (Greg Fitzgerald) Date: Tue, 9 Jul 2013 15:11:03 -0700 Subject: [LLVMdev] reproducing binaries on llvm.org Message-ID: Are the packaging scripts used to produce the clang+llvm binaries on llvm.org under version control? If so, can you please point me to them? Thanks, Greg From eli.friedman at gmail.com Tue Jul 9 15:34:37 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Tue, 9 Jul 2013 15:34:37 -0700 Subject: [LLVMdev] Floating point ordered and unordered comparisons In-Reply-To: <5236cf16e93285734b1b29a85c981727.squirrel@www.codeaurora.org> References: <5236cf16e93285734b1b29a85c981727.squirrel@www.codeaurora.org> Message-ID: On Tue, Jul 9, 2013 at 3:00 PM, wrote: > Hi All, > > I noticed LLVM target independent side is converting an ordered less than > "setolt" into unordered greater than "setuge" operation. There are no > target hooks to control going from the ordered mode into unordered. > > I am trying to figure out the best way to support unordered operation on > Hexagon. We don't have a single instruction to do unordered operation. So > we will have to break it down into 2 instructions - check if unordered > followed by the actual operation. > > I looked at X86 and ARM and it seems like both targes support unordered > comparisons. I would prefer target independent part not to transform > ordered ops into unordered. Is it a good idea? How do other targets > support this feature? > > I don't have a lot of experience dealing with floating points. I will > really appreciate any help here. The function ISD::getSetCCInverse() would probably be useful for you here: you can use it to transform an unordered operation into an ordered operation. -Eli From arnaud.adegm at gmail.com Tue Jul 9 15:40:29 2013 From: arnaud.adegm at gmail.com (Arnaud A. de Grandmaison) Date: Wed, 10 Jul 2013 00:40:29 +0200 Subject: [LLVMdev] reproducing binaries on llvm.org In-Reply-To: References: Message-ID: <1965476.6UKsHg8PtK@fario> On Tuesday 09 July 2013 15:11:03 Greg Fitzgerald wrote: > Are the packaging scripts used to produce the clang+llvm binaries on > llvm.org under version control? If so, can you please point me to > them? The script used for building the binaries can be found under llvm : utils/release/test-release.sh The packages are mere tarballs of the installation dir. Cheers, > > Thanks, > Greg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Arnaud A. de Grandmaison From garious at gmail.com Tue Jul 9 16:16:17 2013 From: garious at gmail.com (Greg Fitzgerald) Date: Tue, 9 Jul 2013 16:16:17 -0700 Subject: [LLVMdev] reproducing binaries on llvm.org In-Reply-To: <1965476.6UKsHg8PtK@fario> References: <1965476.6UKsHg8PtK@fario> Message-ID: > The script used for building the binaries can be found under llvm : > utils/release/test-release.sh Thanks! A few follow-up questions: > # Phase 3: Build llvmCore with newly built clang from phase 2. > c_compiler="$gcc_compiler -fplugin=$dragonegg_phase2_objdir/dragonegg.so" > cxx_compiler="$gxx_compiler -fplugin=$dragonegg_phase2_objdir/dragonegg.so" Should that say "newly built dragonegg", not "newly built clang"? > echo "# Creating symlinks" Is it possible to configure the build without modifying the llvm.src directory (by creating those symlinks)? I see there is a '--with-clang' option. Is there a general mechanism for pointing the build to external directories similar to the one in the CMake build? LLVM_EXTERNAL__SOURCE_DIR > $BuildDir/llvm.src/configure --prefix=$InstallDir Is there a version of this script that configures the build with CMake? Thanks, Greg From ahmed.bougacha at gmail.com Tue Jul 9 16:17:04 2013 From: ahmed.bougacha at gmail.com (Ahmed Bougacha) Date: Tue, 9 Jul 2013 16:17:04 -0700 Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph extraction In-Reply-To: <3947CD34E13C4F4AB2D94AD35AE3FE600707889C@smi-exchange1.smi.local> References: <3947CD34E13C4F4AB2D94AD35AE3FE600707889C@smi-exchange1.smi.local> Message-ID: On Tue, Jul 9, 2013 at 2:06 PM, Micah Villmow wrote: > This isn’t by itself too difficult, as I have done something similar > recently, but does require some modifications of LLVM. By the way there’s some stuff in LLVM that creates an MC CFG (MCModule, MCObjectDisassembler, ..), but it still needs a lot of work to be reliable and work in more cases - I have some patches locally that need some more work and that I’ll eventually push though. It gets tricky when you want to really have basic blocks, without duplicating subsets of the instructions when you discover an entry point in a basic block you already created. It’s even trickier when you consider jumping inside an instruction, and needing to join an existing basic block. For instance if you jump to an instruction that starts at address X and takes up 7 bytes, but disassembling at address X+5 gives you a valid 2 byte instruction, then you need to have a basic block with the 7byte instruction, another with the 2byte one, and both having the basic block starting at X+7 as a successor. If you want to do some quick experimentation, you can use "llvm-objdump -cfg -d ”, which gives you a CFG for each function found in the binary in a separate graphviz dot file. It doesn’t look at the object file format stuff (symbols, or fancier things like the FUNCTION_STARTS load command on mach-o), but again, I’ll get around to all this eventually. Until then, patches welcome ! — Ahmed > The basic algorithm is simple: > > For each ISA instruction, create a new MachineInstr and add it to the > current MachineBasicBlock. > > At each branch instruction, add it to the current MBB and add it to a list > and create a new MBB. > > After creating your list of MBB, iterate through them and reconnect the > successors based on branches and fall throughs. > > > > The problem is that what you are producing has no connection to the IR, and > there are parts of LLVM that expect that link, specifically the printing/CFG > dumping functions. > > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Clay J > Sent: Tuesday, July 09, 2013 10:36 AM > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] Basic instructions for LLVM and Control Flow graph > extraction > > > > I am currently attempting to learn how to use LLVM for control flow graph > extraction on linux (Ubuntu). Basically, I need to be able to break down > specific basic functions blocks from assembly code, and use it to make a > CFG. > > Do any of you upstanding human beings have any knowledge or resources that > could possibly assist me in this task? > > I apologize if this is a very basic question. I have already installed the > proper files/programs. > > Thank you in advance. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From sundeepk at codeaurora.org Tue Jul 9 20:21:14 2013 From: sundeepk at codeaurora.org (sundeepk at codeaurora.org) Date: Wed, 10 Jul 2013 03:21:14 -0000 Subject: [LLVMdev] Floating point ordered and unordered comparisons In-Reply-To: References: <5236cf16e93285734b1b29a85c981727.squirrel@www.codeaurora.org> Message-ID: <6c5069011f286284929742c2d7f9dd99.squirrel@www.codeaurora.org> > The function ISD::getSetCCInverse() would probably be useful for you > here: you can use it to transform an unordered operation into an > ordered operation. Thanks for your reply Eli. I will check how to convert unordered operations back to ordered one. I have another related question - is it possible for frontend (clang) to generate unordered operation from the source code? -Sundeep From zdevito at gmail.com Tue Jul 9 21:01:48 2013 From: zdevito at gmail.com (Zach Devito) Date: Tue, 9 Jul 2013 21:01:48 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions Message-ID: I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! Zach LLVM Code: define <4 x double> @vstore(<4 x double>*) { entry: %1 = load <4 x double>* %0, align 8 ret <4 x double> %1 } ------------------------------------------------------------ Running llvm-32/bin/llc vstore.ll creates: .section __TEXT,__text,regular,pure_instructions .globl _vstore .align 4, 0x90 _vstore: ## @vstore .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp vmovups (%rdi), %ymm0 popq %rbp ret .cfi_endproc ---------------------------------------------------------------- Running llvm-33/bin/llc vstore.ll creates: .section __TEXT,__text,regular,pure_instructions .globl _main .align 4, 0x90 _main: ## @main .cfi_startproc ## BB#0: ## %entry vmovups (%rdi), %xmm0 vinsertf128 $1, 16(%rdi), %ymm0, %ymm0 ret .cfi_endproc .subsections_via_symbols -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Tue Jul 9 21:55:23 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Tue, 9 Jul 2013 21:55:23 -0700 Subject: [LLVMdev] Floating point ordered and unordered comparisons In-Reply-To: <6c5069011f286284929742c2d7f9dd99.squirrel@www.codeaurora.org> References: <5236cf16e93285734b1b29a85c981727.squirrel@www.codeaurora.org> <6c5069011f286284929742c2d7f9dd99.squirrel@www.codeaurora.org> Message-ID: On Tue, Jul 9, 2013 at 8:21 PM, wrote: >> The function ISD::getSetCCInverse() would probably be useful for you >> here: you can use it to transform an unordered operation into an >> ordered operation. > > Thanks for your reply Eli. I will check how to convert unordered > operations back to ordered one. I have another related question - is it > possible for frontend (clang) to generate unordered operation from the > source code? Some builtins like __builtin_isnormal will generate them. -Eli From tom at stellard.net Tue Jul 9 21:57:13 2013 From: tom at stellard.net (Tom Stellard) Date: Tue, 9 Jul 2013 21:57:13 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: References: Message-ID: <20130710045713.GB2426@L7-CNU1252LKR-172027226155.amd.com> On Tue, Jul 09, 2013 at 09:01:48PM -0700, Zach Devito wrote: > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads > on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as > a single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, > which seems to be due to this. > > Any ideas why this changed? Thanks! > Hi Zack, I ran into a similar problem with the R600 backend, and I was able to fix it by implementing the TargetLowering::allowsUnalignedMemoryAccesses(). Take a look at r184822. -Tom > Zach > > LLVM Code: > define <4 x double> @vstore(<4 x double>*) { > entry: > %1 = load <4 x double>* %0, align 8 > ret <4 x double> %1 > } > ------------------------------------------------------------ > Running llvm-32/bin/llc vstore.ll creates: > .section __TEXT,__text,regular,pure_instructions > .globl _vstore > .align 4, 0x90 > _vstore: ## @vstore > .cfi_startproc > ## BB#0: ## %entry > pushq %rbp > Ltmp2: > .cfi_def_cfa_offset 16 > Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > Ltmp4: > .cfi_def_cfa_register %rbp > vmovups (%rdi), %ymm0 > popq %rbp > ret > .cfi_endproc > ---------------------------------------------------------------- > Running llvm-33/bin/llc vstore.ll creates: > .section __TEXT,__text,regular,pure_instructions > .globl _main > .align 4, 0x90 > _main: ## @main > .cfi_startproc > ## BB#0: ## %entry > vmovups (%rdi), %xmm0 > vinsertf128 $1, 16(%rdi), %ymm0, %ymm0 > ret > .cfi_endproc > > > .subsections_via_symbols > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eli.friedman at gmail.com Tue Jul 9 22:01:33 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Tue, 9 Jul 2013 22:01:33 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: References: Message-ID: On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito wrote: > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads > on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a > single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which > seems to be due to this. > > Any ideas why this changed? Thanks! This was intentional; apparently doing it with two instructions is supposed to be faster. See r172868/r172894. Adding Nadav in case he has anything more to say. -Eli From nrotem at apple.com Tue Jul 9 22:15:02 2013 From: nrotem at apple.com (Nadav Rotem) Date: Tue, 09 Jul 2013 22:15:02 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: References: Message-ID: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> Hi, Yes. On Sandybridge 256-bit loads/stores are double pumped. This means that they go in one after the other in two cycles. On Haswell the memory ports are wide enough to allow a 256bit memory operation in one cycle. So, on Sandybridge we split unaligned memory operations into two 128bit parts to allow them to execute in two separate ports. This is also what GCC and ICC do. It is very possible that the decision to split the wide vectors causes a regression. If the memory ports are busy it is better to double-pump them and save the cost of the insert/extract subvector. Unfortunately, during ISel we don’t have a good way to estimate port pressure. In any case, it is a good idea to revise the heuristics that I put in and to see if it matches the Sandybridge optimization guide. If I remember correctly the optimization guide does not have too much information on this, but Elena looked over it and said that it made sense. BTW, you can validate that this is the problem using the IACA tool. It performs static analysis on your binary and tells you where the critical path is. http://software.intel.com/en-us/articles/intel-architecture-code-analyzer Thanks, Nadav On Jul 9, 2013, at 10:01 PM, Eli Friedman wrote: > On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito wrote: >> I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads >> on AVX. >> 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a >> single instruction (details below). >> In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which >> seems to be due to this. >> >> Any ideas why this changed? Thanks! > > This was intentional; apparently doing it with two instructions is > supposed to be faster. See r172868/r172894. > > Adding Nadav in case he has anything more to say. > > -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From neleai at seznam.cz Tue Jul 9 22:32:08 2013 From: neleai at seznam.cz (=?utf-8?B?T25kxZllaiBCw61sa2E=?=) Date: Wed, 10 Jul 2013 07:32:08 +0200 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: References: Message-ID: <20130710053208.GA6046@domone.kolej.mff.cuni.cz> On Tue, Jul 09, 2013 at 09:01:48PM -0700, Zach Devito wrote: > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector > loads on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as > a single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, > which seems to be due to this. > Any ideas why this changed? Thanks! What is code and architecture? In most loops spliting makes code faster when ran on ivy bridge, You could dig intel optimization manual for that recomendation. Perhaps this code is special case. > Zach > LLVM Code: > define <4 x double> @vstore(<4 x double>*) { > entry: >   %1 = load <4 x double>* %0, align 8 >   ret <4 x double> %1 > } > ------------------------------------------------------------ > Running llvm-32/bin/llc vstore.ll creates: > .section __TEXT,__text,regular,pure_instructions > .globl _vstore > .align 4, 0x90 > _vstore:                                ## @vstore > .cfi_startproc > ## BB#0:                                ## %entry > pushq %rbp > Ltmp2: > .cfi_def_cfa_offset 16 > Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > Ltmp4: > .cfi_def_cfa_register %rbp > vmovups (%rdi), %ymm0 > popq %rbp > ret > .cfi_endproc > ---------------------------------------------------------------- > Running llvm-33/bin/llc vstore.ll creates: >         .section        __TEXT,__text,regular,pure_instructions >         .globl  _main >         .align  4, 0x90 > _main:                                  ## @main >         .cfi_startproc > ## BB#0:                                ## %entry >         vmovups (%rdi), %xmm0 >         vinsertf128     $1, 16(%rdi), %ymm0, %ymm0 >         ret >         .cfi_endproc > .subsections_via_symbols > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- fat electrons in the lines From zdevito at stanford.edu Tue Jul 9 23:33:38 2013 From: zdevito at stanford.edu (Zach Devito) Date: Tue, 9 Jul 2013 23:33:38 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> References: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> Message-ID: Thanks for all the the info! I'm still in the process of narrowing down the performance difference in my code. I'm no longer convinced its related to only the unaligned loads/stores alone since extracting this part of the kernel makes the performance difference disappear. I will try to narrow down what is going on and if it seems related LLVM, I will post an example. Thanks again, Zach On Tue, Jul 9, 2013 at 10:15 PM, Nadav Rotem wrote: > Hi, > > Yes. On Sandybridge 256-bit loads/stores are double pumped. This means > that they go in one after the other in two cycles. On Haswell the memory > ports are wide enough to allow a 256bit memory operation in one cycle. So, > on Sandybridge we split unaligned memory operations into two 128bit parts > to allow them to execute in two separate ports. This is also what GCC and > ICC do. > > It is very possible that the decision to split the wide vectors causes a > regression. If the memory ports are busy it is better to double-pump them > and save the cost of the insert/extract subvector. Unfortunately, during > ISel we don’t have a good way to estimate port pressure. In any case, it is > a good idea to revise the heuristics that I put in and to see if it matches > the Sandybridge optimization guide. If I remember correctly the > optimization guide does not have too much information on this, but Elena > looked over it and said that it made sense. > > BTW, you can validate that this is the problem using the IACA tool. It > performs static analysis on your binary and tells you where the critical > path is. > http://software.intel.com/en-us/articles/intel-architecture-code-analyzer > > Thanks, > Nadav > > > On Jul 9, 2013, at 10:01 PM, Eli Friedman wrote: > > On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito wrote: > > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads > on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as > a > single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, > which > seems to be due to this. > > Any ideas why this changed? Thanks! > > > This was intentional; apparently doing it with two instructions is > supposed to be faster. See r172868/r172894. > > Adding Nadav in case he has anything more to say. > > -Eli > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elena.demikhovsky at intel.com Wed Jul 10 00:50:55 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Wed, 10 Jul 2013 07:50:55 +0000 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> References: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> Message-ID: Send me a pointer to the code, I'll check performance for our workloads. - Elena From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, July 10, 2013 08:15 To: Eli Friedman Cc: Zach Devito; LLVM Developers Mailing List; Demikhovsky, Elena Subject: Re: [LLVMdev] unaligned AVX store gets split into two instructions Hi, Yes. On Sandybridge 256-bit loads/stores are double pumped. This means that they go in one after the other in two cycles. On Haswell the memory ports are wide enough to allow a 256bit memory operation in one cycle. So, on Sandybridge we split unaligned memory operations into two 128bit parts to allow them to execute in two separate ports. This is also what GCC and ICC do. It is very possible that the decision to split the wide vectors causes a regression. If the memory ports are busy it is better to double-pump them and save the cost of the insert/extract subvector. Unfortunately, during ISel we don't have a good way to estimate port pressure. In any case, it is a good idea to revise the heuristics that I put in and to see if it matches the Sandybridge optimization guide. If I remember correctly the optimization guide does not have too much information on this, but Elena looked over it and said that it made sense. BTW, you can validate that this is the problem using the IACA tool. It performs static analysis on your binary and tells you where the critical path is. http://software.intel.com/en-us/articles/intel-architecture-code-analyzer Thanks, Nadav On Jul 9, 2013, at 10:01 PM, Eli Friedman > wrote: On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito > wrote: I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! This was intentional; apparently doing it with two instructions is supposed to be faster. See r172868/r172894. Adding Nadav in case he has anything more to say. -Eli --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zdevito at stanford.edu Wed Jul 10 02:12:31 2013 From: zdevito at stanford.edu (Zach Devito) Date: Wed, 10 Jul 2013 02:12:31 -0700 Subject: [LLVMdev] unaligned AVX store gets split into two instructions In-Reply-To: References: <5CC241EC-A268-4A3C-994A-59888EAEF8EE@apple.com> Message-ID: I've narrowed this down to a single kernel (kernel.ll), which does a fixed-size matrix-matrix multiply: # ~/llvm-32-final/bin/llc kernel.ll -o kernel32.s # ~/llvm-33-final/bin/llc kernel.ll -o kernel33.s # ~/llvm-32-final/bin/clang++ harness.cpp kernel32.s -o harness32 # ~/llvm-32-final/bin/clang++ harness.cpp kernel33.s -o harness33 # time ./harness32 real 0m0.584s user 0m0.581s sys 0m0.001s # time ./harness33 real 0m0.730s user 0m0.725s sys 0m0.001s If you look at kernel33.s, it has a register spill/reload in the inner loop. This doesn't appear in the llvm 3.2 version and disappears from the 3.3 version if you remove the "align 8"s from kernel.ll which are making it unaligned. Do the two-instruction unaligned loads increase register pressure? Or is something else going on? Zach On Tue, Jul 9, 2013 at 11:33 PM, Zach Devito wrote: > Thanks for all the the info! I'm still in the process of narrowing down > the performance difference in my code. I'm no longer convinced its related > to only the unaligned loads/stores alone since extracting this part of the > kernel makes the performance difference disappear. I will try to narrow > down what is going on and if it seems related LLVM, I will post an example. > Thanks again, > > Zach > > > On Tue, Jul 9, 2013 at 10:15 PM, Nadav Rotem wrote: > >> Hi, >> >> Yes. On Sandybridge 256-bit loads/stores are double pumped. This means >> that they go in one after the other in two cycles. On Haswell the memory >> ports are wide enough to allow a 256bit memory operation in one cycle. So, >> on Sandybridge we split unaligned memory operations into two 128bit parts >> to allow them to execute in two separate ports. This is also what GCC and >> ICC do. >> >> It is very possible that the decision to split the wide vectors causes a >> regression. If the memory ports are busy it is better to double-pump them >> and save the cost of the insert/extract subvector. Unfortunately, during >> ISel we don’t have a good way to estimate port pressure. In any case, it is >> a good idea to revise the heuristics that I put in and to see if it matches >> the Sandybridge optimization guide. If I remember correctly the >> optimization guide does not have too much information on this, but Elena >> looked over it and said that it made sense. >> >> BTW, you can validate that this is the problem using the IACA tool. It >> performs static analysis on your binary and tells you where the critical >> path is. >> http://software.intel.com/en-us/articles/intel-architecture-code-analyzer >> >> Thanks, >> Nadav >> >> >> On Jul 9, 2013, at 10:01 PM, Eli Friedman wrote: >> >> On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito wrote: >> >> I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector >> loads >> on AVX. >> 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted >> as a >> single instruction (details below). >> In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, >> which >> seems to be due to this. >> >> Any ideas why this changed? Thanks! >> >> >> This was intentional; apparently doing it with two instructions is >> supposed to be faster. See r172868/r172894. >> >> Adding Nadav in case he has anything more to say. >> >> -Eli >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: harness.cpp Type: text/x-c++src Size: 346 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kernel.ll Type: application/octet-stream Size: 6787 bytes Desc: not available URL: From arnaud.adegm at gmail.com Wed Jul 10 05:00:01 2013 From: arnaud.adegm at gmail.com (Arnaud A. de Grandmaison) Date: Wed, 10 Jul 2013 14:00:01 +0200 Subject: [LLVMdev] reproducing binaries on llvm.org In-Reply-To: References: <1965476.6UKsHg8PtK@fario> Message-ID: <51DD4CC1.2000500@gmail.com> On 07/10/2013 01:16 AM, Greg Fitzgerald wrote: >> The script used for building the binaries can be found under llvm : >> utils/release/test-release.sh > Thanks! A few follow-up questions: > > >> # Phase 3: Build llvmCore with newly built clang from phase 2. >> c_compiler="$gcc_compiler -fplugin=$dragonegg_phase2_objdir/dragonegg.so" >> cxx_compiler="$gxx_compiler -fplugin=$dragonegg_phase2_objdir/dragonegg.so" > Should that say "newly built dragonegg", not "newly built clang"? It seems so. I saw Duncan just commited a fix to the comment. > > >> echo "# Creating symlinks" > Is it possible to configure the build without modifying the llvm.src > directory (by creating those symlinks)? I see there is a > '--with-clang' option. Is there a general mechanism for pointing the > build to external directories similar to the one in the CMake build? > > LLVM_EXTERNAL__SOURCE_DIR > I am not a user of the configure script, so I can not tell. For my every day use of llvm, I use the CMake build. I just use test-release.sh when doing a release so that all binaries from llvm.org are built the same way. The configure & cmake build systems are not always equivalent. There has been numerous threads on the mailing list already on this subject. >> $BuildDir/llvm.src/configure --prefix=$InstallDir > Is there a version of this script that configures the build with CMake? Not to my knowledge. Patches are welcome :) This script is used for releasing the binaries at llvm.org. I think there are some virtues in ensuring people use the same way of building them. On the other hand, for individuals who wish to make their own binaries, there could be a different script, with some more choices / options. On the third hand ;-), the binaries on llvm.org are more intended for quick off-the-shelf testing : distributions (debian, ...) are expected to package the binaries differently --- and in a much more complex way --- so that it integrates smoothly in their packaging system. Cheers, -- Arnaud A. de Grandmaison > > Thanks, > Greg -- Arnaud A. de Grandmaison From raghavendrak at huawei.com Wed Jul 10 07:50:12 2013 From: raghavendrak at huawei.com (Raghavendra K) Date: Wed, 10 Jul 2013 14:50:12 +0000 Subject: [LLVMdev] Getting strcut member attributes Message-ID: Hi, I need a help, #define OPT #define MAN struct A { int i; char* c; }; struct B { OPT A a; MAN int i; }; After parsing the above .h file, how to get the attributes of B members specifically for member A which is prefixed with OPT... So far am able to get type of a as A but unable to OPT...as it might be preprocessed and as it is empty it may discarded or... regards ragha From adasgupt at codeaurora.org Wed Jul 10 08:10:41 2013 From: adasgupt at codeaurora.org (Anshuman Dasgupta) Date: Wed, 10 Jul 2013 10:10:41 -0500 Subject: [LLVMdev] Hexagon buildbots enabled on lab.llvm.org Message-ID: <51DD7971.9080702@codeaurora.org> Hi llvmdev, I wanted to announce that we have added a pair of Hexagon buildbots to lab.llvm.org. We had been working on ensuring all the tests pass and getting the buildbot infrastructure ready. The buildbots were enabled yesterday afternoon. The buildbots are: http://lab.llvm.org:8011/builders/llvm-hexagon-elfand http://lab.llvm.org:8011/builders/clang-hexagon-elf Thanks Rick, Jyotsna, and Galina for helping set the buildbotsup! Please feel free to email Rick or me if you have any questions on the Hexagon buildbots. -Anshu --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From baldrick at free.fr Wed Jul 10 09:41:55 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 10 Jul 2013 18:41:55 +0200 Subject: [LLVMdev] Getting strcut member attributes In-Reply-To: References: Message-ID: <51DD8ED3.80105@free.fr> Hi ragha, as this is a clang question I suggest you ask on the clang mailing list. Best wishes, Duncan. On 10/07/13 16:50, Raghavendra K wrote: > > Hi, > > I need a help, > > #define OPT > #define MAN > > struct A > { > int i; > char* c; > }; > > > struct B > { > OPT A a; > MAN int i; > }; > > After parsing the above .h file, how to get the attributes of B members specifically > for member A which is prefixed with OPT... > > So far am able to get type of a as A but unable to OPT...as it might be preprocessed and as it is empty > it may discarded or... > > > > regards > ragha > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From artagnon at gmail.com Wed Jul 10 10:20:41 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 10 Jul 2013 22:50:41 +0530 Subject: [LLVMdev] [BUG] Support for .cfi_startproc simple Message-ID: Hi, According to the GNU as documentation [1], .cfi_startproc simple is a valid directive. Unfortunately, LLVM complains with: error: invalid instruction mnemonic 'simple' I happened to notice it while attempting to compile linux.git with clang: see arch/x86/ia32/ia32entry.S:75. [1]: http://sourceware.org/binutils/docs-2.22/as/Pseudo-Ops.html#Pseudo-Ops Thanks. From artagnon at gmail.com Wed Jul 10 11:12:37 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 10 Jul 2013 23:42:37 +0530 Subject: [LLVMdev] [BUG] Support unqualified btr, bts Message-ID: Hi, I happened to notice that linux.git uses plenty of btr and bts instructions (not btrl, btrw, btsl, btsw). For examples, see arch/x86/include/asm/bitops.h. LLVM barfs on these due to ambiguity, while GNU as is fine with them. Surely, there must be architectures where the w/l variant is unavailable? LLVM must support those architectures, no? Thanks. From artagnon at gmail.com Wed Jul 10 11:16:26 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 10 Jul 2013 23:46:26 +0530 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} Message-ID: Hi, These warnings are included by default with -Wall in GCC 4.6 [1], and LLVM should support them instead of throwing -Wunknown-warning-option. [1]: http://gcc.gnu.org/gcc-4.6/porting_to.html Thanks. From eli.friedman at gmail.com Wed Jul 10 11:21:57 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 11:21:57 -0700 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: On Wed, Jul 10, 2013 at 11:16 AM, Ramkumar Ramachandra wrote: > Hi, > > These warnings are included by default with -Wall in GCC 4.6 [1], and > LLVM should support them instead of throwing -Wunknown-warning-option. > > [1]: http://gcc.gnu.org/gcc-4.6/porting_to.html Please file bug reports at llvm.org/bugs/ -Eli From grosbach at apple.com Wed Jul 10 11:20:15 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 10 Jul 2013 11:20:15 -0700 Subject: [LLVMdev] [BUG] Support for .cfi_startproc simple In-Reply-To: References: Message-ID: http://llvm.org/bugs/ On Jul 10, 2013, at 10:20 AM, Ramkumar Ramachandra wrote: > Hi, > > According to the GNU as documentation [1], .cfi_startproc simple is a > valid directive. Unfortunately, LLVM complains with: > > error: invalid instruction mnemonic 'simple' > > I happened to notice it while attempting to compile linux.git with > clang: see arch/x86/ia32/ia32entry.S:75. > > [1]: http://sourceware.org/binutils/docs-2.22/as/Pseudo-Ops.html#Pseudo-Ops > > Thanks. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Wed Jul 10 11:22:31 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 11:22:31 -0700 Subject: [LLVMdev] [BUG] Support for .cfi_startproc simple In-Reply-To: References: Message-ID: On Wed, Jul 10, 2013 at 10:20 AM, Ramkumar Ramachandra wrote: > Hi, > > According to the GNU as documentation [1], .cfi_startproc simple is a > valid directive. Unfortunately, LLVM complains with: > > error: invalid instruction mnemonic 'simple' > > I happened to notice it while attempting to compile linux.git with > clang: see arch/x86/ia32/ia32entry.S:75. > > [1]: http://sourceware.org/binutils/docs-2.22/as/Pseudo-Ops.html#Pseudo-Ops Please file bugs at llvm.org/bugs/ From eli.friedman at gmail.com Wed Jul 10 11:30:03 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 11:30:03 -0700 Subject: [LLVMdev] [BUG] Support unqualified btr, bts In-Reply-To: References: Message-ID: On Wed, Jul 10, 2013 at 11:12 AM, Ramkumar Ramachandra wrote: > Hi, > > I happened to notice that linux.git uses plenty of btr and bts > instructions (not btrl, btrw, btsl, btsw). For examples, see > arch/x86/include/asm/bitops.h. LLVM barfs on these due to ambiguity, > while GNU as is fine with them. Surely, there must be architectures > where the w/l variant is unavailable? Both variants have existed since the Intel 386. That said, we should probably handle this like GNU as because the variants behave almost identically. Please file a bug. -Eli From dblaikie at gmail.com Wed Jul 10 11:31:59 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 10 Jul 2013 11:31:59 -0700 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: FWIW I'd sort of prefer just to have a generalized dead store warning (special casing for parameters doesn't seem all that important - though I could be wrong). On Wed, Jul 10, 2013 at 11:16 AM, Ramkumar Ramachandra wrote: > Hi, > > These warnings are included by default with -Wall in GCC 4.6 [1], and > LLVM should support them instead of throwing -Wunknown-warning-option. > > [1]: http://gcc.gnu.org/gcc-4.6/porting_to.html > > Thanks. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eirini_dit at windowslive.com Wed Jul 10 11:35:37 2013 From: eirini_dit at windowslive.com (Eirini _) Date: Wed, 10 Jul 2013 21:35:37 +0300 Subject: [LLVMdev] lower-lever IR (A-normal form) Message-ID: Hi, i would like to ask you, if i can get a lower-level representation than the llvm IR.For example, having the following instruction in the llvm IR, call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) i would like to get something like this (in A-normal form (without nested instructions):%temp = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0) call void @llvm.memcpy.i32(i8* %19, %temp, i32 12, i32 1) Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 10 11:38:51 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 00:08:51 +0530 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: Eli Friedman wrote: > Please file bug reports at llvm.org/bugs/ Filed bugs for all three. I was hoping to fix these now, with some help. From artagnon at gmail.com Wed Jul 10 11:43:12 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 00:13:12 +0530 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: David Blaikie wrote: > FWIW I'd sort of prefer just to have a generalized dead store warning > (special casing for parameters doesn't seem all that important - > though I could be wrong). Does LLVM have some existing code to detect a dead store that I can just hook into to fix the bug? I agree that the {variable,parameter} might not be important in practice, but we need to be compatible with GCC. From dblaikie at gmail.com Wed Jul 10 11:48:08 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 10 Jul 2013 11:48:08 -0700 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: On Wed, Jul 10, 2013 at 11:43 AM, Ramkumar Ramachandra wrote: > David Blaikie wrote: >> FWIW I'd sort of prefer just to have a generalized dead store warning >> (special casing for parameters doesn't seem all that important - >> though I could be wrong). > > Does LLVM have some existing code to detect a dead store that I can > just hook into to fix the bug? I agree that the {variable,parameter} > might not be important in practice, but we need to be compatible with > GCC. Not at the moment - I've had half a mind to implement a generalized dead store warning at some point. (& when I go to do so I'll start by looking at the -Wsometimes-uninitialized implementation to get a feel for how to visit loads/stores in the CFG, etc) From eli.friedman at gmail.com Wed Jul 10 12:00:19 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 12:00:19 -0700 Subject: [LLVMdev] [BUG] Support for -W[no-]unused-but-set-{variable, parameter} In-Reply-To: References: Message-ID: On Wed, Jul 10, 2013 at 11:38 AM, Ramkumar Ramachandra wrote: > Eli Friedman wrote: >> Please file bug reports at llvm.org/bugs/ > > Filed bugs for all three. I was hoping to fix these now, with some help. Oh... I can give you some quick pointers, then: For the unused-but-set warnings, not sure how you'd want to go about implementing them... I'm not convinced those specific warnings are particularly useful, as opposed to a more general dead-store warning, but you could start digging around near Sema::DiagnoseUnusedParameters in clang/lib/Sema/SemaDecl.cpp . You probably want to send an email to cfe-dev, though; discussions about clang-specific stuff generally goes there. For the bts issue, take a look at LLVM r126047. For CFI parsing, look at llvm/lib/MC/MCParser/AsmParser.cpp. If you're interested in contributing patches, you might also want to skim http://llvm.org/docs/DeveloperPolicy.html . -Eli From criswell at illinois.edu Wed Jul 10 12:24:51 2013 From: criswell at illinois.edu (John Criswell) Date: Wed, 10 Jul 2013 14:24:51 -0500 Subject: [LLVMdev] Problem Adding New Pass to Alias Analysis Group Message-ID: <51DDB503.3040909@illinois.edu> Dear All, I'm trying to add a new alias analysis to the alias analysis group in LLVM 3.2. This new pass is linked statically into a tool that lives outside the LLVM source tree, so I'm trying to avoid making patches to the LLVM sources. I've added the INITIALIZE_AG_PASS_BEGIN() and INITIALIZE_AG_PASS_END() code to the pass, manually scheduled it before the MemoryDependenceAnalysis pass, and have tried making it a FunctionPass and an ImmutablePass, but no matter what I do, it seems like MemoryDependenceAnalysis and other passes keep using the -no-aa default pass instead. 1) Does anyone have ideas on how to verify that my pass is part of the alias analysis group? 2) Does anyone have any ideas on what I might be doing wrong? Any ideas would be appreciated. Thanks in advance, -- John T. From t.p.northover at gmail.com Wed Jul 10 12:25:20 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 10 Jul 2013 20:25:20 +0100 Subject: [LLVMdev] lower-lever IR (A-normal form) In-Reply-To: References: Message-ID: Hi Eirini, > i would like to get something like this (in A-normal form (without nested > instructions): The nested instructions come from anything LLVM can identify as a constant (specifically, any value that's a subclass of Constant). I'm not aware of a pass to turn them back into non-constant Values, though one could clearly be written. It would probably count more as a pessimisation than an optimisation though. The Constants have a reasonable chance at resulting in no instructions at all, which I doubt the expanded form would. What are you trying to do where you think it would be an advantage? Perhaps there's a better way without compromising optimisations. Cheers. Tim. From joerg at britannica.bec.de Wed Jul 10 12:25:32 2013 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Wed, 10 Jul 2013 21:25:32 +0200 Subject: [LLVMdev] [BUG] Support unqualified btr, bts In-Reply-To: References: Message-ID: <20130710192532.GA5855@britannica.bec.de> On Wed, Jul 10, 2013 at 11:30:03AM -0700, Eli Friedman wrote: > On Wed, Jul 10, 2013 at 11:12 AM, Ramkumar Ramachandra > wrote: > > Hi, > > > > I happened to notice that linux.git uses plenty of btr and bts > > instructions (not btrl, btrw, btsl, btsw). For examples, see > > arch/x86/include/asm/bitops.h. LLVM barfs on these due to ambiguity, > > while GNU as is fine with them. Surely, there must be architectures > > where the w/l variant is unavailable? > > Both variants have existed since the Intel 386. > > That said, we should probably handle this like GNU as because the > variants behave almost identically. Please file a bug. I don't consider this a bug. Just like certain FP instructions, they *are* ambigious and there is no reason to depend on magic assembler choices. Joerg From criswell at illinois.edu Wed Jul 10 12:33:56 2013 From: criswell at illinois.edu (John Criswell) Date: Wed, 10 Jul 2013 14:33:56 -0500 Subject: [LLVMdev] lower-lever IR (A-normal form) In-Reply-To: References: Message-ID: <51DDB724.8010603@illinois.edu> On 7/10/13 1:35 PM, Eirini _ wrote: > Hi, > > i would like to ask you, if i can get a lower-level representation > than the llvm IR. > For example, having the following instruction in the llvm IR, > call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x > [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) As Tim has pointed out, the "nested" getelementptr isn't really a nested instruction. It is a compile-time constant. The reason why it looks like a GEP is that it is in a symbolic form. If you really want to convert these constants into instructions, take a look at the BreakConstantGEPs pass in SAFECode. It converts some (but not all) constant expressions into LLVM instructions and will alleviate the "nesting" that you see. As Tim also said, this conversion is not an optimization. You should not expect the resulting IR to compile into efficient code. -- John T. > > i would like to get something like this (in _A-normal form_ (without > nested instructions): > *%temp* = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, > i32 0, i32 0, i32 0, i32 0) > call void @llvm.memcpy.i32(i8* %19,* %temp*, i32 12, i32 1) > > > Thanks > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Wed Jul 10 13:18:41 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 13:18:41 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <1373484576-20213-1-git-send-email-artagnon@gmail.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> Message-ID: On Wed, Jul 10, 2013 at 12:29 PM, Ramkumar Ramachandra wrote: > The instructions btr and bts are perfectly valid, and have existed since > Intel 386. GNU as supports them fine. Unfortunately, LLVM does not > support them, and barfs with: > > error: ambiguous instructions require an explicit suffix > > Fix this problem by disambiguating it correctly, following the example > set by 824a907. > > Cc: Eli Friedman > Cc: Chris Lattner > Signed-off-by: Ramkumar Ramachandra > --- > I've probably done something stupid; seems to build correctly, but > that's all I know. Also, tests are pending. > > lib/Target/X86/X86InstrInfo.td | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td > index f33ae2a..4950674 100644 > --- a/lib/Target/X86/X86InstrInfo.td > +++ b/lib/Target/X86/X86InstrInfo.td > @@ -1971,6 +1971,10 @@ def : InstAlias<"aam", (AAM8i8 10)>; > // Disambiguate the mem/imm form of bt-without-a-suffix as btl. > def : InstAlias<"bt $imm, $mem", (BT32mi8 i32mem:$mem, i32i8imm:$imm)>; > > +// Disambiguate btr and bts, just like GNU as. > +def : InstAlias<"btr $imm, $mem", (BT16mi8 i16mem:$mem, i16i8imm:$imm)>; > +def : InstAlias<"bts $imm, $mem", (BT16mi8 i16mem:$mem, i16i8imm:$imm)>; > + > // clr aliases. > def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>; > def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>; > -- > 1.8.3.2.736.g869de25 > Please send patches to llvm-commits. Please include a testcase with each patch. Please check that your patch actually works correctly before sending it to the mailing list for review. (See http://llvm.org/docs/DeveloperPolicy.html .) -Eli From tasoskalog at hotmail.com Wed Jul 10 11:32:58 2013 From: tasoskalog at hotmail.com (Tasos Kalogeropoulos) Date: Wed, 10 Jul 2013 21:32:58 +0300 Subject: [LLVMdev] lower-lever IR (A-normal form) Message-ID: Hi, i would like to ask you, if i can get a lower-level representation than the llvm IR.For example, having the following instruction in the llvm IR, call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) i would like to get something like this (in A-normal form (without nested instructions):%temp = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0) call void @llvm.memcpy.i32(i8* %19, %temp, i32 12, i32 1) Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Wed Jul 10 13:41:08 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 10 Jul 2013 13:41:08 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> Message-ID: <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Also, please elaborate on why this is a good change. Because gas accepts it isn’t sufficient reason in and of itself. -Jim On Jul 10, 2013, at 1:18 PM, Eli Friedman wrote: > On Wed, Jul 10, 2013 at 12:29 PM, Ramkumar Ramachandra > wrote: >> The instructions btr and bts are perfectly valid, and have existed since >> Intel 386. GNU as supports them fine. Unfortunately, LLVM does not >> support them, and barfs with: >> >> error: ambiguous instructions require an explicit suffix >> >> Fix this problem by disambiguating it correctly, following the example >> set by 824a907. >> >> Cc: Eli Friedman >> Cc: Chris Lattner >> Signed-off-by: Ramkumar Ramachandra >> --- >> I've probably done something stupid; seems to build correctly, but >> that's all I know. Also, tests are pending. >> >> lib/Target/X86/X86InstrInfo.td | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td >> index f33ae2a..4950674 100644 >> --- a/lib/Target/X86/X86InstrInfo.td >> +++ b/lib/Target/X86/X86InstrInfo.td >> @@ -1971,6 +1971,10 @@ def : InstAlias<"aam", (AAM8i8 10)>; >> // Disambiguate the mem/imm form of bt-without-a-suffix as btl. >> def : InstAlias<"bt $imm, $mem", (BT32mi8 i32mem:$mem, i32i8imm:$imm)>; >> >> +// Disambiguate btr and bts, just like GNU as. >> +def : InstAlias<"btr $imm, $mem", (BT16mi8 i16mem:$mem, i16i8imm:$imm)>; >> +def : InstAlias<"bts $imm, $mem", (BT16mi8 i16mem:$mem, i16i8imm:$imm)>; >> + >> // clr aliases. >> def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>; >> def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>; >> -- >> 1.8.3.2.736.g869de25 >> > > Please send patches to llvm-commits. Please include a testcase with > each patch. Please check that your patch actually works correctly > before sending it to the mailing list for review. (See > http://llvm.org/docs/DeveloperPolicy.html .) > > -Eli > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 10 13:41:15 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 02:11:15 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> Message-ID: Eli Friedman wrote: >> I've probably done something stupid; seems to build correctly, but >> that's all I know. Also, tests are pending. > > Please send patches to llvm-commits. Please include a testcase with > each patch. Please check that your patch actually works correctly > before sending it to the mailing list for review. I know. I was asking for a sanity check before doing another iteration. Never mind: I'll submit another iteration soon. From cristiannomartins at gmail.com Wed Jul 10 13:43:46 2013 From: cristiannomartins at gmail.com (Cristianno Martins) Date: Wed, 10 Jul 2013 14:43:46 -0600 Subject: [LLVMdev] Problem Adding New Pass to Alias Analysis Group In-Reply-To: <51DDB503.3040909@illinois.edu> References: <51DDB503.3040909@illinois.edu> Message-ID: Hello John, What opt command line arguments are you using? If you follow this link, you can see that -no-aa is the default alias analysis implementation if you do not manually specify which AA passes you want to use. Note that you can pass as many different implementations of AA as you want, and each of them will be chained together for each function, like a pipeline, if the previous one was not able to determine if there is a dependence or not. Hope this help, -- Cristianno Martins PhD Student of Computer Science University of Campinas cmartins at ic.unicamp.br On Wed, Jul 10, 2013 at 1:24 PM, John Criswell wrote: > Dear All, > > I'm trying to add a new alias analysis to the alias analysis group in LLVM > 3.2. This new pass is linked statically into a tool that lives outside the > LLVM source tree, so I'm trying to avoid making patches to the LLVM sources. > > I've added the INITIALIZE_AG_PASS_BEGIN() and INITIALIZE_AG_PASS_END() > code to the pass, manually scheduled it before the MemoryDependenceAnalysis > pass, and have tried making it a FunctionPass and an ImmutablePass, but no > matter what I do, it seems like MemoryDependenceAnalysis and other passes > keep using the -no-aa default pass instead. > > 1) Does anyone have ideas on how to verify that my pass is part of the > alias analysis group? > > 2) Does anyone have any ideas on what I might be doing wrong? > > Any ideas would be appreciated. > > Thanks in advance, > > -- John T. > > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 10 13:44:39 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 02:14:39 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Message-ID: Jim Grosbach wrote: > Also, please elaborate on why this is a good change. Because gas accepts it > isn’t sufficient reason in and of itself. That they're valid instructions isn't sufficient reason? Should I additionally say that linux.git uses them? I wrote: > The instructions btr and bts are perfectly valid, and have existed since > Intel 386. From rafael.espindola at gmail.com Wed Jul 10 13:49:22 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Wed, 10 Jul 2013 16:49:22 -0400 Subject: [LLVMdev] lower-lever IR (A-normal form) In-Reply-To: References: Message-ID: Note that the getelementptr in the example is a constant, not an instruction. On 10 July 2013 14:32, Tasos Kalogeropoulos wrote: > Hi, > > i would like to ask you, if i can get a lower-level representation than the > llvm IR. > For example, having the following instruction in the llvm IR, > call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x [2 x > [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) > > i would like to get something like this (in A-normal form (without nested > instructions): > %temp = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, > i32 0, i32 0, i32 0) > call void @llvm.memcpy.i32(i8* %19, %temp, i32 12, i32 1) > > > Thanks > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From grosbach at apple.com Wed Jul 10 13:53:04 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 10 Jul 2013 13:53:04 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Message-ID: On Jul 10, 2013, at 1:44 PM, Ramkumar Ramachandra wrote: > Jim Grosbach wrote: >> Also, please elaborate on why this is a good change. Because gas accepts it >> isn’t sufficient reason in and of itself. > > That they're valid instructions isn't sufficient reason? Should I > additionally say that linux.git uses them? > Is the diagnostic incorrect? To say that another way, is the assembler correctly diagnosing a previously unnoticed problem in the project source code, or is the assembler not behaving correctly according the the documented Intel assembly mnemonics? If the former, the fix belongs in the project, not the assembler. If the latter, then we should absolutely fix the assembler. From your emails, it is unclear to me which is the case. -Jim > I wrote: >> The instructions btr and bts are perfectly valid, and have existed since >> Intel 386. -------------- next part -------------- An HTML attachment was scrubbed... URL: From criswell at illinois.edu Wed Jul 10 14:06:20 2013 From: criswell at illinois.edu (John Criswell) Date: Wed, 10 Jul 2013 16:06:20 -0500 Subject: [LLVMdev] Problem Adding New Pass to Alias Analysis Group In-Reply-To: References: <51DDB503.3040909@illinois.edu> Message-ID: <51DDCCCC.1020008@illinois.edu> On 7/10/13 3:43 PM, Cristianno Martins wrote: > Hello John, > > What opt command line arguments are you using? I'm not using opt. I'm manually scheduling a pipline within a tool. The code looks like this: PassManager pm; MyAlias * aa = new MyAlias(); pm.add(aa); pm.add(new MyAliasUsingPass()); Both MyAlias and MyAliasUsingPass are now ModulePass'es. MyAlias is an alias analysis pass while MyAliasUsingPass is a pass that requires an alias analysis and performs a test query. The output of -debug-pass=Structure is the following: No Alias Analysis (always returns 'may' alias) ModulePass Manager MyAlias MyAliasUsingPass I've changed MyAlias to call abort() when it is queried, but the program never crashes when running MyAliasUsingPass, which indicates that my MyAlias is never being used for queries. I've also tried making MyAlias an ImmutablePass, but that didn't appear to work either. > > If you follow this link > , > you can see that -no-aa is the default alias analysis implementation > if you do not manually specify which AA passes you want to use. Note > that you can pass as many different implementations of AA as you want, > and each of them will be chained together for each function, like a > pipeline, if the previous one was not able to determine if there is a > dependence or not. Yes, I am aware of how analysis groups are *supposed* to work. :) I'm just not getting the advertised functionality and am at a loss as to what I could be doing wrong. -- John T. > > Hope this help, > > > > -- > Cristianno Martins > PhD Student of Computer Science > University of Campinas > cmartins at ic.unicamp.br > > > On Wed, Jul 10, 2013 at 1:24 PM, John Criswell > wrote: > > Dear All, > > I'm trying to add a new alias analysis to the alias analysis group > in LLVM 3.2. This new pass is linked statically into a tool that > lives outside the LLVM source tree, so I'm trying to avoid making > patches to the LLVM sources. > > I've added the INITIALIZE_AG_PASS_BEGIN() and > INITIALIZE_AG_PASS_END() code to the pass, manually scheduled it > before the MemoryDependenceAnalysis pass, and have tried making it > a FunctionPass and an ImmutablePass, but no matter what I do, it > seems like MemoryDependenceAnalysis and other passes keep using > the -no-aa default pass instead. > > 1) Does anyone have ideas on how to verify that my pass is part of > the alias analysis group? > > 2) Does anyone have any ideas on what I might be doing wrong? > > Any ideas would be appreciated. > > Thanks in advance, > > -- John T. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 10 14:08:42 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 02:38:42 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Message-ID: Jim Grosbach wrote: > To say that another way, is the assembler correctly diagnosing a previously > unnoticed problem in the project source code, or is the assembler not > behaving correctly according the the documented Intel assembly mnemonics? Where are the authoritative instruction set pages? If such a thing were readily available, why are there gaps in the current implementation? A quick Googling gets me [1], but I can't say it's authoritative. What's important is that there certainly are architectures where btr/bts are valid instructions, and they must be supported. btr/bts are certainly not invalid instructions that we're bending over backwards to support, because linux.git/gas works with them. [1]: http://web.itu.edu.tr/kesgin/mul06/intel/index.html From criswell at illinois.edu Wed Jul 10 14:18:33 2013 From: criswell at illinois.edu (John Criswell) Date: Wed, 10 Jul 2013 16:18:33 -0500 Subject: [LLVMdev] Problem Adding New Pass to Alias Analysis Group In-Reply-To: <51DDCCCC.1020008@illinois.edu> References: <51DDB503.3040909@illinois.edu> <51DDCCCC.1020008@illinois.edu> Message-ID: <51DDCFA9.6070705@illinois.edu> Dear All, I think I just found the problem. For those interested, only a few methods of the AliasAnalysis class are virtual; many are non-virtual convenience wrapper that call the virtual methods. A new alias analysis cannot override the non-virtual methods; it must override the virtual methods. As it turns out, my alias analysis (more accurately, the alias analysis that I'm porting) overrode the non-virtual wrappers and not the virtual methods, so passes using it would call AliasAnalysis:: which called the AliasAnalysis::. I'm guessing the wrapper methods were virtual in earlier versions of LLVM. -- John T. On 7/10/13 4:06 PM, John Criswell wrote: > On 7/10/13 3:43 PM, Cristianno Martins wrote: >> Hello John, >> >> What opt command line arguments are you using? > > I'm not using opt. I'm manually scheduling a pipline within a tool. > The code looks like this: > > PassManager pm; > MyAlias * aa = new MyAlias(); > pm.add(aa); > pm.add(new MyAliasUsingPass()); > > Both MyAlias and MyAliasUsingPass are now ModulePass'es. MyAlias is > an alias analysis pass while MyAliasUsingPass is a pass that requires > an alias analysis and performs a test query. > > The output of -debug-pass=Structure is the following: > > No Alias Analysis (always returns 'may' alias) > ModulePass Manager > MyAlias > MyAliasUsingPass > > > I've changed MyAlias to call abort() when it is queried, but the > program never crashes when running MyAliasUsingPass, which indicates > that my MyAlias is never being used for queries. > > I've also tried making MyAlias an ImmutablePass, but that didn't > appear to work either. > >> >> If you follow this link >> , >> you can see that -no-aa is the default alias analysis implementation >> if you do not manually specify which AA passes you want to use. Note >> that you can pass as many different implementations of AA as you >> want, and each of them will be chained together for each function, >> like a pipeline, if the previous one was not able to determine if >> there is a dependence or not. > > Yes, I am aware of how analysis groups are *supposed* to work. :) I'm > just not getting the advertised functionality and am at a loss as to > what I could be doing wrong. > > -- John T. > >> >> Hope this help, >> >> >> >> -- >> Cristianno Martins >> PhD Student of Computer Science >> University of Campinas >> cmartins at ic.unicamp.br >> >> >> On Wed, Jul 10, 2013 at 1:24 PM, John Criswell > > wrote: >> >> Dear All, >> >> I'm trying to add a new alias analysis to the alias analysis >> group in LLVM 3.2. This new pass is linked statically into a >> tool that lives outside the LLVM source tree, so I'm trying to >> avoid making patches to the LLVM sources. >> >> I've added the INITIALIZE_AG_PASS_BEGIN() and >> INITIALIZE_AG_PASS_END() code to the pass, manually scheduled it >> before the MemoryDependenceAnalysis pass, and have tried making >> it a FunctionPass and an ImmutablePass, but no matter what I do, >> it seems like MemoryDependenceAnalysis and other passes keep >> using the -no-aa default pass instead. >> >> 1) Does anyone have ideas on how to verify that my pass is part >> of the alias analysis group? >> >> 2) Does anyone have any ideas on what I might be doing wrong? >> >> Any ideas would be appreciated. >> >> Thanks in advance, >> >> -- John T. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Wed Jul 10 14:30:08 2013 From: echristo at gmail.com (Eric Christopher) Date: Wed, 10 Jul 2013 14:30:08 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Message-ID: On Wed, Jul 10, 2013 at 2:08 PM, Ramkumar Ramachandra wrote: > Jim Grosbach wrote: >> To say that another way, is the assembler correctly diagnosing a previously >> unnoticed problem in the project source code, or is the assembler not >> behaving correctly according the the documented Intel assembly mnemonics? > > Where are the authoritative instruction set pages? If such a thing > were readily available, why are there gaps in the current > implementation? A quick Googling gets me [1], but I can't say it's > authoritative. What's important is that there certainly are > architectures where btr/bts are valid instructions, and they must be > supported. btr/bts are certainly not invalid instructions that we're > bending over backwards to support, because linux.git/gas works with > them. > > [1]: http://web.itu.edu.tr/kesgin/mul06/intel/index.html > _ The correct answer here is "Vol 2a of the Intel Reference Manual available off of the Intel website". As far as whether or not we should support the instructions with a lack of length mnemonic, I'm going to step back away from the discussion. The Intel manuals support Intel syntax (which we also support) which necessarily doesn't have length encodings for most instructions, but AT&T syntax which gas uses (and llvm supports as well) often has length specifiers for instructions. -eric From grosbach at apple.com Wed Jul 10 14:44:53 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 10 Jul 2013 14:44:53 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> Message-ID: <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> On Jul 10, 2013, at 2:30 PM, Eric Christopher wrote: > On Wed, Jul 10, 2013 at 2:08 PM, Ramkumar Ramachandra > wrote: >> Jim Grosbach wrote: >>> To say that another way, is the assembler correctly diagnosing a previously >>> unnoticed problem in the project source code, or is the assembler not >>> behaving correctly according the the documented Intel assembly mnemonics? >> >> Where are the authoritative instruction set pages? If such a thing >> were readily available, why are there gaps in the current >> implementation? A quick Googling gets me [1], but I can't say it's >> authoritative. What's important is that there certainly are >> architectures where btr/bts are valid instructions, and they must be >> supported. btr/bts are certainly not invalid instructions that we're >> bending over backwards to support, because linux.git/gas works with >> them. >> >> [1]: http://web.itu.edu.tr/kesgin/mul06/intel/index.html >> _ > > The correct answer here is "Vol 2a of the Intel Reference Manual > available off of the Intel website”. Yep, for Intel syntax that’s exactly right. > As far as whether or not we should support the instructions with a > lack of length mnemonic, I'm going to step back away from the > discussion. The Intel manuals support Intel syntax (which we also > support) which necessarily doesn't have length encodings for most > instructions, but AT&T syntax which gas uses (and llvm supports as > well) often has length specifiers for instructions. The length specifier is, as I understand it, required when the instruction references memory but is optional (and inferred from the registers) for the register variants. The best reference I know of for the AT&T syntax is: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf That does raise a clarifying question here. Is the code you’re interested in using Intel or AT&T syntax? Also note that the question isn’t whether we should support the btr/bts instructions. We absolutely must (and do). The question is whether we are properly handling the un-suffixed mnemonic form of the assembly syntax. Perhaps you can help by clarifying via explicit example what code you believe should work but does that. Descriptions are great, but specifics are better once we’re this deep into the details. -Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From s at pahtak.org Wed Jul 10 18:54:58 2013 From: s at pahtak.org (Stephen Checkoway) Date: Wed, 10 Jul 2013 21:54:58 -0400 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> Message-ID: On Jul 10, 2013, at 17:44, Jim Grosbach wrote: > The length specifier is, as I understand it, required when the instruction references memory but is optional (and inferred from the registers) for the register variants. > > The best reference I know of for the AT&T syntax is: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf I'm not sure I'd use the documentation for the Solaris assembler as authoritative for AT&T syntax, but page 17 does say that if the suffix is omitted it defaults to long. However, that isn't my experience with gas which uses register operands to disambiguate, if possible (although I'm on a phone and can't check right now). -- Stephen Checkoway -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Wed Jul 10 19:14:37 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Wed, 10 Jul 2013 21:14:37 -0500 (CDT) Subject: [LLVMdev] Script for stressing llc In-Reply-To: <1718947528.10790691.1373396891368.JavaMail.root@alcf.anl.gov> Message-ID: <1712418091.11275081.1373508877420.JavaMail.root@alcf.anl.gov> A few people have requested features; I've implemented them in this updated version (attached). Do you think this is worth putting in the repo somewhere? -Hal ----- Original Message ----- > Hi, > > I wrote a small script in order to stress test llc using test cases > generated by llvm-stress. When it finds a case where llc seems to > have crashed, it greps the output for Assertion, LLVM ERROR, etc., > removes things that look like hex numbers and ID numbers, and then > checksums the resulting text. In this way, it can automatically > categorize different bugs into different subdirectories. > > I found this useful, and maybe you will too :) > > -Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: stress-v2.sh Type: application/x-shellscript Size: 1436 bytes Desc: not available URL: From grosbach at apple.com Wed Jul 10 19:15:15 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 10 Jul 2013 19:15:15 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> Message-ID: <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> On Jul 10, 2013, at 6:54 PM, Stephen Checkoway wrote: > On Jul 10, 2013, at 17:44, Jim Grosbach wrote: >> The length specifier is, as I understand it, required when the instruction references memory but is optional (and inferred from the registers) for the register variants. >> >> The best reference I know of for the AT&T syntax is: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > I'm not sure I'd use the documentation for the Solaris assembler as authoritative for AT&T syntax, but page 17 does say that if the suffix is omitted it defaults to long. Yeah, me either. That’s part of why I’m asking for references. I’d love to know which ones other people use. Good docs for AT&T syntax are annoyingly hard to find. > However, that isn't my experience with gas which uses register operands to disambiguate, if possible (although I'm on a phone and can't check right now). Yep, it’s the memory reference versions which IIUC require a suffix. I want to make sure we do the right thing here and know why we’re doing it rather than just adding some aliases because it matches gas’ behavior. -Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Wed Jul 10 19:30:48 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 10 Jul 2013 19:30:48 -0700 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> Message-ID: On Wed, Jul 10, 2013 at 7:15 PM, Jim Grosbach wrote: > > On Jul 10, 2013, at 6:54 PM, Stephen Checkoway wrote: > > On Jul 10, 2013, at 17:44, Jim Grosbach wrote: > > The length specifier is, as I understand it, required when the instruction > references memory but is optional (and inferred from the registers) for the > register variants. > > The best reference I know of for the AT&T syntax is: > http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > > I'm not sure I'd use the documentation for the Solaris assembler as > authoritative for AT&T syntax, but page 17 does say that if the suffix is > omitted it defaults to long. > > > Yeah, me either. That’s part of why I’m asking for references. I’d love to > know which ones other people use. Good docs for AT&T syntax are annoyingly > hard to find. > > However, that isn't my experience with gas which uses register operands to > disambiguate, if possible (although I'm on a phone and can't check right > now). > > > Yep, it’s the memory reference versions which IIUC require a suffix. > > I want to make sure we do the right thing here and know why we’re doing it > rather than just adding some aliases because it matches gas’ behavior. The reason it's the right thing to do is that the mem/imm forms of btsw and btsl have exactly the same semantics. -Eli From silvas at purdue.edu Wed Jul 10 21:12:51 2013 From: silvas at purdue.edu (Sean Silva) Date: Wed, 10 Jul 2013 21:12:51 -0700 Subject: [LLVMdev] Script for stressing llc In-Reply-To: <1712418091.11275081.1373508877420.JavaMail.root@alcf.anl.gov> References: <1718947528.10790691.1373396891368.JavaMail.root@alcf.anl.gov> <1712418091.11275081.1373508877420.JavaMail.root@alcf.anl.gov> Message-ID: The only precedent I have seen in recent years for shell scripts is the (absolutely insanely amazingly well-written) utils/TableGen/tdtags. Ignoring the issue of whether this kind of tool belongs in the repo, IMO it would be nice if you used tdtags as a "template" for this script; there's a large amount of shell-fu (not clever "tricks", but actual "how to make a robust, readable, portable shell script"-fu) in there that you will want to imitate. > LLVMHOME=/home/projects/llvm/upstream/llvm-trunk-build/Release+Asserts/bin > SOPTS="-generate-ppc-fp128" > TOPTS="-mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7" Hardcoding these seems a bit "wrong". Are there any reasonable defaults we can use? ((pwd, empty, host triple) for the three options, respectively, might be reasonable?). -- Sean Silva On Wed, Jul 10, 2013 at 7:14 PM, Hal Finkel wrote: > A few people have requested features; I've implemented them in this > updated version (attached). Do you think this is worth putting in the repo > somewhere? > > -Hal > > ----- Original Message ----- > > Hi, > > > > I wrote a small script in order to stress test llc using test cases > > generated by llvm-stress. When it finds a case where llc seems to > > have crashed, it greps the output for Assertion, LLVM ERROR, etc., > > removes things that look like hex numbers and ID numbers, and then > > checksums the resulting text. In this way, it can automatically > > categorize different bugs into different subdirectories. > > > > I found this useful, and maybe you will too :) > > > > -Hal > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Wed Jul 10 21:19:39 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Wed, 10 Jul 2013 23:19:39 -0500 (CDT) Subject: [LLVMdev] Script for stressing llc In-Reply-To: Message-ID: <1894914210.11290096.1373516379151.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > The only precedent I have seen in recent years for shell scripts is > the (absolutely insanely amazingly well-written) > utils/TableGen/tdtags. > > > Ignoring the issue of whether this kind of tool belongs in the repo, Ah, but do you have an opinion on that? > IMO it would be nice if you used tdtags as a "template" for this > script; there's a large amount of shell-fu (not clever "tricks", but > actual "how to make a robust, readable, portable shell script"-fu) > in there that you will want to imitate. Sounds good. I'll look at it. > > > > > LLVMHOME=/home/projects/llvm/upstream/llvm-trunk-build/Release+Asserts/bin > > SOPTS="-generate-ppc-fp128" > > TOPTS="-mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7" > > > Hardcoding these seems a bit "wrong". Are there any reasonable > defaults we can use? ((pwd, empty, host triple) for the three > options, respectively, might be reasonable?). Sure ;) -- I did not mean to imply that I would commit them that way (although all of these things can be overridden using the command-line arguments). I think that (`pwd`, empty, empty) would make reasonable defaults. Thanks again, Hal > > > -- Sean Silva > > > > > > On Wed, Jul 10, 2013 at 7:14 PM, Hal Finkel < hfinkel at anl.gov > > wrote: > > > A few people have requested features; I've implemented them in this > updated version (attached). Do you think this is worth putting in > the repo somewhere? > > -Hal > > > > ----- Original Message ----- > > Hi, > > > > I wrote a small script in order to stress test llc using test cases > > generated by llvm-stress. When it finds a case where llc seems to > > have crashed, it greps the output for Assertion, LLVM ERROR, etc., > > removes things that look like hex numbers and ID numbers, and then > > checksums the resulting text. In this way, it can automatically > > categorize different bugs into different subdirectories. > > > > I found this useful, and maybe you will too :) > > > > -Hal > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From jevinsweval at gmail.com Wed Jul 10 19:18:23 2013 From: jevinsweval at gmail.com (Jevin Sweval) Date: Wed, 10 Jul 2013 22:18:23 -0400 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> Message-ID: <383133607810849727@unknownmsgid> http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/x86/include/asm/bitops.h#L68 Here is one example that I found. Are the inline assembly arguments ambiguous in size? -Jevin Sent from my mobile device. Typos are par for the course. On Jul 10, 2013, at 5:47 PM, Jim Grosbach wrote: On Jul 10, 2013, at 2:30 PM, Eric Christopher wrote: On Wed, Jul 10, 2013 at 2:08 PM, Ramkumar Ramachandra wrote: Jim Grosbach wrote: To say that another way, is the assembler correctly diagnosing a previously unnoticed problem in the project source code, or is the assembler not behaving correctly according the the documented Intel assembly mnemonics? Where are the authoritative instruction set pages? If such a thing were readily available, why are there gaps in the current implementation? A quick Googling gets me [1], but I can't say it's authoritative. What's important is that there certainly are architectures where btr/bts are valid instructions, and they must be supported. btr/bts are certainly not invalid instructions that we're bending over backwards to support, because linux.git/gas works with them. [1]: http://web.itu.edu.tr/kesgin/mul06/intel/index.html _ The correct answer here is "Vol 2a of the Intel Reference Manual available off of the Intel website”. Yep, for Intel syntax that’s exactly right. As far as whether or not we should support the instructions with a lack of length mnemonic, I'm going to step back away from the discussion. The Intel manuals support Intel syntax (which we also support) which necessarily doesn't have length encodings for most instructions, but AT&T syntax which gas uses (and llvm supports as well) often has length specifiers for instructions. The length specifier is, as I understand it, required when the instruction references memory but is optional (and inferred from the registers) for the register variants. The best reference I know of for the AT&T syntax is: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf That does raise a clarifying question here. Is the code you’re interested in using Intel or AT&T syntax? Also note that the question isn’t whether we should support the btr/bts instructions. We absolutely must (and do). The question is whether we are properly handling the un-suffixed mnemonic form of the assembly syntax. Perhaps you can help by clarifying via explicit example what code you believe should work but does that. Descriptions are great, but specifics are better once we’re this deep into the details. -Jim _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 10 22:29:32 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 10:59:32 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> Message-ID: Jim Grosbach wrote: > That does raise a clarifying question here. Is the code you’re interested in > using Intel or AT&T syntax? > > Also note that the question isn’t whether we should support the btr/bts > instructions. We absolutely must (and do). The question is whether we are > properly handling the un-suffixed mnemonic form of the assembly syntax. > > Perhaps you can help by clarifying via explicit example what code you > believe should work but does that. Descriptions are great, but specifics are > better once we’re this deep into the details. I don't know! This is the first time I'm hearing about differences between Intel and AT&T syntax; I have absolutely no clue how btr/bts _should_ be implemented, and the only specifics I have are real-world examples: linux.git/gas. Does looking at bitops.h in the kernel tree help? For the record, I don't think matching linux.git/gas is a problem: they're very authoritative pieces of software that have been around for a _really_ long time. What matters to me is that real-world software works as expected, even if it violates some transcendental (but unimplemented) authoritative standard [1]. If other mainstream assemblers handle btr/bts differently, I would see a cause for worry; otherwise, no. Eli Friedman wrote: > The reason it's the right thing to do is that the mem/imm forms of > btsw and btsl have exactly the same semantics. Not sure I understand this. [1]: Especially considering that we can't seem to find one. From vsp1729 at gmail.com Wed Jul 10 23:43:22 2013 From: vsp1729 at gmail.com (Vikram Singh) Date: Wed, 10 Jul 2013 23:43:22 -0700 (PDT) Subject: [LLVMdev] Generating unusual instruction In-Reply-To: References: <1357556135143-53192.post@n5.nabble.com> Message-ID: <1373525002442-59245.post@n5.nabble.com> Hi Dongrui I looked for MIPS tblgen .td file but in case MIPS also it is (1) first subtracting the two operand and then (2) comparing the result with zero. i.e. it is generating two insns. But i want that in just one. Any help..... Regard VSP -- View this message in context: http://llvm.1065342.n5.nabble.com/Generating-unusual-instruction-tp53192p59245.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From artagnon at gmail.com Thu Jul 11 00:04:05 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Thu, 11 Jul 2013 12:34:05 +0530 Subject: [LLVMdev] [BUG] Support missing macro arguments Message-ID: Hi, I noticed that a macro defined like: .macro PTREGSCALL label, func, arg is being called without the third argument like: PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn GNU as supports this fine, but LLVM barfs (excerpt taken from linux.git arch/x86/ia32/ia32entry.S). So it should perhaps be demoted to a warning? Thanks. From raghavendrak at huawei.com Thu Jul 11 01:40:03 2013 From: raghavendrak at huawei.com (Raghavendra K) Date: Thu, 11 Jul 2013 08:40:03 +0000 Subject: [LLVMdev] Getting strcut member attributes In-Reply-To: References: Message-ID: Hi, I need a help, #define OPT #define MAN struct A { int i; char* c; }; struct B { OPT A a; MAN int i; }; After parsing the above .h file, how to get the attributes of B members specifically for member A which is prefixed with OPT... So far am able to get type of a as A but unable to OPT...as it might be preprocessed and as it is empty it may discarded or... regards ragha _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eirini_dit at windowslive.com Thu Jul 11 02:37:54 2013 From: eirini_dit at windowslive.com (Eirini _) Date: Thu, 11 Jul 2013 12:37:54 +0300 Subject: [LLVMdev] lower-lever IR (A-normal form) In-Reply-To: References: , Message-ID: I would like to create some tables for my instructions in the IR. For example a table that has all the store instructions. I want all the arguments to a function or instruction or constant etc to be trivial. So to fix my previous example instead of having : call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) i would like to have in my IR the following : %temp = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0) %temp1 = i8* getelementptr inbounds( %temp, i32 0) %temp2 = i8* getelementptr inbounds (%temp1, i32 0) %temp4 = i8* getelementptr inbounds (%temp3, i32 0) call void @llvm.memcpy.i32(i8* %19, i8* %temp4, i32 12, i32 1) What i'm asking is, if LLVM can do this for me (for example with a certain flag while compiling the code) or i should break on my own these kind of expressions into simpler. Let me rephrase your question slightly. Would you be satisfied if the disassembled llvm format showed constant expressions on their own line, without turning them into instructions or changing the binary representation at all? On Thu, Jul 11, 2013 at 4:05 AM, Eirini _ wrote: Hi, i would like to ask you, if i can get a lower-level representation than the llvm IR.For example, having the following instruction in the llvm IR, call void @llvm.memcpy.i32(i8* %19, i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0), i32 12, i32 1) i would like to get something like this (in A-normal form (without nested instructions): %temp = i8* getelementptr inbounds ([2 x [2 x [3 x i8]]]* @main.s, i32 0, i32 0, i32 0, i32 0) call void @llvm.memcpy.i32(i8* %19, %temp, i32 12, i32 1) Thanks _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From joerg at britannica.bec.de Thu Jul 11 05:00:13 2013 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 11 Jul 2013 14:00:13 +0200 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> Message-ID: <20130711120013.GA18557@britannica.bec.de> On Thu, Jul 11, 2013 at 10:59:32AM +0530, Ramkumar Ramachandra wrote: > For the record, I don't think matching linux.git/gas is a problem: > they're very authoritative pieces of software that have been around > for a _really_ long time. That's a very, very weak argument. There are a lot of things Clang rejects as errors by default that has been used in old code bases, because GCC accepted it. > Eli Friedman wrote: > > The reason it's the right thing to do is that the mem/imm forms of > > btsw and btsl have exactly the same semantics. > > Not sure I understand this. There is no way to tell from the arguments whether bts should be btsw or btsl. That's the ambiguity it is complaining about. Fixing it is trivial. There are other cases with similar issues in the 8087 syntax with size of the floating point operand. Joerg From sebastian.redl at getdesigned.at Thu Jul 11 05:55:39 2013 From: sebastian.redl at getdesigned.at (Sebastian Redl) Date: Thu, 11 Jul 2013 14:55:39 +0200 Subject: [LLVMdev] Getting strcut member attributes In-Reply-To: References: Message-ID: <51DEAB4B.6040400@getdesigned.at> On 2013-07-11 10:40, Raghavendra K wrote: > #define OPT > #define MAN > > struct A > { > int i; > char* c; > }; > > > struct B > { > OPT A a; > MAN int i; > }; > > After parsing the above .h file, how to get the attributes of B members specifically > for member A which is prefixed with OPT... > > So far am able to get type of a as A but unable to OPT...as it might be preprocessed and as it is empty > it may discarded or... > Are you talking about Clang's AST (in which case you should ask your question on cfe-dev, not here) or IR? In either case, empty preprocessor defines are gone. There's no way to recover them. Instead, define OPT and MAN to be something like __attribute__((annotate("opt"))), which will be preserved. Sebastian From pranavb at codeaurora.org Thu Jul 11 11:16:22 2013 From: pranavb at codeaurora.org (Pranav Bhandarkar) Date: Thu, 11 Jul 2013 13:16:22 -0500 Subject: [LLVMdev] Scalar Evolution and Loop Trip Count. Message-ID: <51DEF676.3010207@codeaurora.org> Hi, Scalar evolution seems to be wrapping around the trip count in the following loop. void add (int *restrict a, int *restrict b, int *restrict c) { char i; for (i = 0; i < 255; i++) a[i] = b[i] + c[i]; } When I run scalar evolution on the bit code, I get a backedge-taken count which is obviously wrong. $> cat loop.ll ; Function Attrs: nounwind define void @add(i32* noalias nocapture %a, i32* noalias nocapture %b, i32* noalias nocapture %c) #0 { entry: br label %for.body for.body: ; preds = %entry, %for.body %arrayidx.phi = phi i32* [ %b, %entry ], [ %arrayidx.inc, %for.body ] %arrayidx3.phi = phi i32* [ %c, %entry ], [ %arrayidx3.inc, %for.body ] %arrayidx5.phi = phi i32* [ %a, %entry ], [ %arrayidx5.inc, %for.body ] %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %0 = load i32* %arrayidx.phi, align 4, !tbaa !0 %1 = load i32* %arrayidx3.phi, align 4, !tbaa !0 %add = add nsw i32 %1, %0 store i32 %add, i32* %arrayidx5.phi, align 4, !tbaa !0 %indvars.iv.next = add i32 %indvars.iv, 1 %lftr.wideiv = trunc i32 %indvars.iv.next to i8 %exitcond = icmp eq i8 %lftr.wideiv, -1 %arrayidx.inc = getelementptr i32* %arrayidx.phi, i32 1 %arrayidx3.inc = getelementptr i32* %arrayidx3.phi, i32 1 %arrayidx5.inc = getelementptr i32* %arrayidx5.phi, i32 1 br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body ret void } $> opt -scalar-evolution -analyze loop.ll Printing analysis 'Scalar Evolution Analysis' for function 'add': Classifying expressions for: @add %arrayidx.phi = phi i32* [ %b, %entry ], [ %arrayidx.inc, %for.body ] --> {%b,+,4}<%for.body> Exits: (1016 + %b) %arrayidx3.phi = phi i32* [ %c, %entry ], [ %arrayidx3.inc, %for.body ] --> {%c,+,4}<%for.body> Exits: (1016 + %c) %arrayidx5.phi = phi i32* [ %a, %entry ], [ %arrayidx5.inc, %for.body ] --> {%a,+,4}<%for.body> Exits: (1016 + %a) %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ] --> {0,+,1}<%for.body> Exits: 254 %0 = load i32* %arrayidx.phi, align 4, !tbaa !0 --> %0 Exits: <> %1 = load i32* %arrayidx3.phi, align 4, !tbaa !0 --> %1 Exits: <> %add = add nsw i32 %1, %0 --> (%0 + %1) Exits: <> %indvars.iv.next = add i32 %indvars.iv, 1 --> {1,+,1}<%for.body> Exits: 255 %lftr.wideiv = trunc i32 %indvars.iv.next to i8 --> {1,+,1}<%for.body> Exits: -1 %arrayidx.inc = getelementptr i32* %arrayidx.phi, i32 1 --> {(4 + %b),+,4}<%for.body> Exits: (1020 + %b) %arrayidx3.inc = getelementptr i32* %arrayidx3.phi, i32 1 --> {(4 + %c),+,4}<%for.body> Exits: (1020 + %c) %arrayidx5.inc = getelementptr i32* %arrayidx5.phi, i32 1 --> {(4 + %a),+,4}<%for.body> Exits: (1020 + %a) Determining loop execution counts for: @add Loop %for.body: backedge-taken count is -2 Loop %for.body: max backedge-taken count is -2 The problem seems to be in SCEVAddRecExpr:getNumIterationsInRange, specifically // If this is an affine expression then we have this situation: // Solve {0,+,A} in Range === Ax in Range // We know that zero is in the range. If A is positive then we know that // the upper value of the range must be the first possible exit value. // If A is negative then the lower of the range is the last possible loop // value. Also note that we already checked for a full range. APInt One(BitWidth,1); APInt A = cast(getOperand(1))->getValue()->getValue(); APInt End = A.sge(One) ? (Range.getUpper() - One) : Range.getLower(); // The exit value should be (End+A)/A. APInt ExitVal = (End + A).udiv(A); ConstantInt *ExitValue = ConstantInt::get(SE.getContext(), ExitVal); In gdb, $6 = {BitWidth = 8, {VAL = 254, pVal = 0xfe}} (gdb) p ExitVal.isNegative() $7 = true (gdb) p ExitValue->dump() i8 -2 $8 = void It looks like whenever the value of ExitVal is greater than 128, ExitValue is going to be negative. This makes getBackedgeTakenCount return a negative number, which does not make sense. Any thoughts ? Pranav -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From justin.holewinski at gmail.com Thu Jul 11 12:07:47 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Thu, 11 Jul 2013 15:07:47 -0400 Subject: [LLVMdev] [cfe-dev] Phabricator down In-Reply-To: References: Message-ID: Is Phabricator down again? I'm starting to see the following: [Rendering Exception] Multiple exceptions during processing and rendering. - AphrontQueryConnectionException: Attempt to connect to phabricator at localhost failed with error #1040: Too many connections. - InvalidArgumentException: Argument 1 passed to AphrontView::setUser() must be an instance of PhabricatorUser, null given, called in /srv/http/phabricator/src/view/page/PhabricatorStandardPageView.php on line 197 and defined On Mon, Jul 8, 2013 at 8:17 AM, Manuel Klimek wrote: > We should be back up - please let me know if anything doesn't work as > expected... > > Cheers, > /Manuel > > > On Mon, Jul 8, 2013 at 12:15 PM, Chandler Carruth wrote: > >> Just as a tiny update, Manuel is actively working on it, but a small >> issue has turned into a larger issue... stay tuned... >> >> >> On Sun, Jul 7, 2013 at 10:11 AM, Manuel Klimek wrote: >> >>> Hi, >>> >>> unfortunately phab has gone down and I currently don't have access to >>> fix it up. I'll work on it first thing tomorrow, so ETA for it getting back >>> is roughly 14 hours. >>> >>> Sorry! >>> /Manuel >>> >>> >> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From micah.villmow at smachines.com Thu Jul 11 12:20:18 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Thu, 11 Jul 2013 19:20:18 +0000 Subject: [LLVMdev] Machine Basic block layout passes, is codegenopt the only pass? [EOM] Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE6007080788@smi-exchange1.smi.local> -------------- next part -------------- An HTML attachment was scrubbed... URL: From klimek at google.com Thu Jul 11 12:27:55 2013 From: klimek at google.com (Manuel Klimek) Date: Thu, 11 Jul 2013 21:27:55 +0200 Subject: [LLVMdev] [cfe-dev] Phabricator down In-Reply-To: References: Message-ID: Yep, sorry, we ran out of space on the instance's database volume. I'll update this once we're back up. On Thu, Jul 11, 2013 at 9:07 PM, Justin Holewinski < justin.holewinski at gmail.com> wrote: > Is Phabricator down again? I'm starting to see the following: > > [Rendering Exception] Multiple exceptions during processing and rendering. > - AphrontQueryConnectionException: Attempt to connect to phabricator at localhost failed with error #1040: Too many connections. > - InvalidArgumentException: Argument 1 passed to AphrontView::setUser() must be an instance of PhabricatorUser, null given, called in /srv/http/phabricator/src/view/page/PhabricatorStandardPageView.php on line 197 and defined > > > > > On Mon, Jul 8, 2013 at 8:17 AM, Manuel Klimek wrote: > >> We should be back up - please let me know if anything doesn't work as >> expected... >> >> Cheers, >> /Manuel >> >> >> On Mon, Jul 8, 2013 at 12:15 PM, Chandler Carruth wrote: >> >>> Just as a tiny update, Manuel is actively working on it, but a small >>> issue has turned into a larger issue... stay tuned... >>> >>> >>> On Sun, Jul 7, 2013 at 10:11 AM, Manuel Klimek wrote: >>> >>>> Hi, >>>> >>>> unfortunately phab has gone down and I currently don't have access to >>>> fix it up. I'll work on it first thing tomorrow, so ETA for it getting back >>>> is roughly 14 hours. >>>> >>>> Sorry! >>>> /Manuel >>>> >>>> >>> >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >> >> > > > -- > > Thanks, > > Justin Holewinski > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Thu Jul 11 12:46:12 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 11 Jul 2013 12:46:12 -0700 Subject: [LLVMdev] Scalar Evolution and Loop Trip Count. In-Reply-To: <51DEF676.3010207@codeaurora.org> References: <51DEF676.3010207@codeaurora.org> Message-ID: On Thu, Jul 11, 2013 at 11:16 AM, Pranav Bhandarkar wrote: > Hi, > > Scalar evolution seems to be wrapping around the trip count in the following > loop. > > void add (int *restrict a, int *restrict b, int *restrict c) { > char i; > for (i = 0; i < 255; i++) > a[i] = b[i] + c[i]; > } > > When I run scalar evolution on the bit code, I get a backedge-taken count > which is obviously wrong. > $> cat loop.ll > ; Function Attrs: nounwind > define void @add(i32* noalias nocapture %a, i32* noalias nocapture %b, i32* > noalias nocapture %c) #0 { > entry: > br label %for.body > > for.body: ; preds = %entry, > %for.body > %arrayidx.phi = phi i32* [ %b, %entry ], [ %arrayidx.inc, %for.body ] > %arrayidx3.phi = phi i32* [ %c, %entry ], [ %arrayidx3.inc, %for.body ] > %arrayidx5.phi = phi i32* [ %a, %entry ], [ %arrayidx5.inc, %for.body ] > %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ] > %0 = load i32* %arrayidx.phi, align 4, !tbaa !0 > %1 = load i32* %arrayidx3.phi, align 4, !tbaa !0 > %add = add nsw i32 %1, %0 > store i32 %add, i32* %arrayidx5.phi, align 4, !tbaa !0 > %indvars.iv.next = add i32 %indvars.iv, 1 > %lftr.wideiv = trunc i32 %indvars.iv.next to i8 > %exitcond = icmp eq i8 %lftr.wideiv, -1 > %arrayidx.inc = getelementptr i32* %arrayidx.phi, i32 1 > %arrayidx3.inc = getelementptr i32* %arrayidx3.phi, i32 1 > %arrayidx5.inc = getelementptr i32* %arrayidx5.phi, i32 1 > br i1 %exitcond, label %for.end, label %for.body > > for.end: ; preds = %for.body > ret void > } > > $> opt -scalar-evolution -analyze loop.ll > Printing analysis 'Scalar Evolution Analysis' for function 'add': > Classifying expressions for: @add > %arrayidx.phi = phi i32* [ %b, %entry ], [ %arrayidx.inc, %for.body ] > --> {%b,+,4}<%for.body> Exits: (1016 + %b) > %arrayidx3.phi = phi i32* [ %c, %entry ], [ %arrayidx3.inc, %for.body ] > --> {%c,+,4}<%for.body> Exits: (1016 + %c) > %arrayidx5.phi = phi i32* [ %a, %entry ], [ %arrayidx5.inc, %for.body ] > --> {%a,+,4}<%for.body> Exits: (1016 + %a) > %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ] > --> {0,+,1}<%for.body> Exits: 254 > %0 = load i32* %arrayidx.phi, align 4, !tbaa !0 > --> %0 Exits: <> > %1 = load i32* %arrayidx3.phi, align 4, !tbaa !0 > --> %1 Exits: <> > %add = add nsw i32 %1, %0 > --> (%0 + %1) Exits: <> > %indvars.iv.next = add i32 %indvars.iv, 1 > --> {1,+,1}<%for.body> Exits: 255 > %lftr.wideiv = trunc i32 %indvars.iv.next to i8 > --> {1,+,1}<%for.body> Exits: -1 > %arrayidx.inc = getelementptr i32* %arrayidx.phi, i32 1 > --> {(4 + %b),+,4}<%for.body> Exits: (1020 + %b) > %arrayidx3.inc = getelementptr i32* %arrayidx3.phi, i32 1 > --> {(4 + %c),+,4}<%for.body> Exits: (1020 + %c) > %arrayidx5.inc = getelementptr i32* %arrayidx5.phi, i32 1 > --> {(4 + %a),+,4}<%for.body> Exits: (1020 + %a) > Determining loop execution counts for: @add > Loop %for.body: backedge-taken count is -2 > Loop %for.body: max backedge-taken count is -2 > > The problem seems to be in SCEVAddRecExpr:getNumIterationsInRange, > specifically > // If this is an affine expression then we have this situation: > // Solve {0,+,A} in Range === Ax in Range > > // We know that zero is in the range. If A is positive then we know > that > // the upper value of the range must be the first possible exit value. > // If A is negative then the lower of the range is the last possible > loop > // value. Also note that we already checked for a full range. > APInt One(BitWidth,1); > APInt A = > cast(getOperand(1))->getValue()->getValue(); > APInt End = A.sge(One) ? (Range.getUpper() - One) : Range.getLower(); > > // The exit value should be (End+A)/A. > APInt ExitVal = (End + A).udiv(A); > ConstantInt *ExitValue = ConstantInt::get(SE.getContext(), ExitVal); > > In gdb, > $6 = {BitWidth = 8, {VAL = 254, pVal = 0xfe}} > (gdb) p ExitVal.isNegative() > $7 = true > (gdb) p ExitValue->dump() > i8 -2 > $8 = void > > It looks like whenever the value of ExitVal is greater than 128, ExitValue > is going to be negative. This makes getBackedgeTakenCount return a negative > number, which does not make sense. Any thoughts ? getBackedgeTakenCount returns an unsigned number; we're just printing it wrong. -Eli From silvas at purdue.edu Thu Jul 11 13:09:51 2013 From: silvas at purdue.edu (Sean Silva) Date: Thu, 11 Jul 2013 13:09:51 -0700 Subject: [LLVMdev] Script for stressing llc In-Reply-To: <1894914210.11290096.1373516379151.JavaMail.root@alcf.anl.gov> References: <1894914210.11290096.1373516379151.JavaMail.root@alcf.anl.gov> Message-ID: On Wed, Jul 10, 2013 at 9:19 PM, Hal Finkel wrote: > ----- Original Message ----- > > > > The only precedent I have seen in recent years for shell scripts is > > the (absolutely insanely amazingly well-written) > > utils/TableGen/tdtags. > > > > > > Ignoring the issue of whether this kind of tool belongs in the repo, > > Ah, but do you have an opinion on that? > > For me, the biggest question is whether this will 1. Identify a bunch of bugs, and then they will be fixed, and then the script turns up no more bugs. or 2. Turn up new bugs with regular frequency as options are varied. In case 1, this is really a short-term tool that few people need to actually run and file the bugs it finds. In case 2, we really want this kind of script to make it as easy as possible for as many people as possible to use in as many configurations as possible, and so it makes sense to have it in-tree and available. Which case do you think this falls into? -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl9pf at gmx.de Thu Jul 11 13:25:13 2013 From: dl9pf at gmx.de (Jan-Simon =?ISO-8859-1?Q?M=F6ller?=) Date: Thu, 11 Jul 2013 22:25:13 +0200 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <383133607810849727@unknownmsgid> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <383133607810849727@unknownmsgid> Message-ID: <13482539.3EiqhZXsap@aragorn.auenland.lan> On Wednesday 10 July 2013 22:18:23 Jevin Sweval wrote: > http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/x86/include/ > asm/bitops.h#L68 > > Here is one example that I found. Are the inline assembly arguments > ambiguous in size? It would help us for sure to build the kernel and others. -- JS From swlin at post.harvard.edu Thu Jul 11 13:41:11 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 11 Jul 2013 13:41:11 -0700 Subject: [LLVMdev] Bikeshedding a name for new directive: CHECK-LABEL vs. CHECK-BOUNDARY vs. something else. Message-ID: Hi, I would like to add a new directive to FileCheck called CHECK-FOO (where FOO is a name under discussion right now) which is used to improve error messages. The idea is that you would use CHECK-FOO on any line that contains a unique identifier (typically labels, function definitions, etc.) that is guaranteed to only occur once in the file; FileCheck will then conceptually break the break the input into blocks separated by these unique identifier lines and perform all other checks localized to between the appropriate blocks; it can ever recover from an error in one block and move on to another. As an example, I purposely introduced the a switch fall-through bug in the last patch I submitted to llvm-commits ("Allow FMAs in safe math mode in some cases when one operand of the fmul is either exactly 0.0 or exactly 1.0")... Bug diff: diff --git a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 0290afc..239b119 100644 --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -5791,7 +5791,7 @@ static bool isExactlyZeroOrOne(const TargetLowering &TLI, const SDValue &Op) { continue; } } - break; +// break; case ISD::FADD: if (ConstantFPSDNode *V0CFP = dyn_cast(V->getOperand(0))) { The single error message without CHECK-FOO is: ; CHECK: test_add_8 ^ :125:2: note: scanning from here .cfi_endproc ^ :127:10: note: possible intended match here .globl _test_add_10 ^ The error messages with CHECK-FOO on the function name label lines are: ; CHECK: vmulsd ^ :87:2: note: scanning from here .align 4, 0x90 ^ :95:2: note: possible intended match here vsubsd %xmm0, %xmm3, %xmm0 ^ fp-contract.ll:118:15: error: expected string not found in input ; CHECK: vmulsd ^ :102:2: note: scanning from here .align 4, 0x90 ^ :109:2: note: possible intended match here vsubsd %xmm2, %xmm3, %xmm2 ^ fp-contract.ll:288:15: error: expected string not found in input ; CHECK: vmulsd ^ :258:2: note: scanning from here .align 4, 0x90 ^ :266:2: note: possible intended match here vsubsd %xmm0, %xmm3, %xmm0 ^ Does anyone have a suggestions on what FOO should be? In my current patch it's currently LABEL, but Eli. B. suggested BOUNDARY Any opinions or other suggestions? Thanks, Stephen On Thu, Jul 11, 2013 at 1:33 PM, Stephen Lin wrote: > It's just short for BOUNDARY. I think BOUNDARY is too long :D > I prefer LABEL though. I can send this to the dev list and ask for > opinions there. > Stephen > > On Thu, Jul 11, 2013 at 12:54 PM, Eli Bendersky wrote: >> >> On Thu, Jul 11, 2013 at 12:44 PM, Stephen Lin >> wrote: >>> >>> Actually, I would be ok with CHECK-BOUND as well. >>> Eli, is that OK to you? And does anyone else want to chime in? >>> I will expand the docs either way. >>> Thanks, >>> Stephen >> >> >> I'm not sure what BOUND means in this case? And how is it different from >> BOUNDARY? >> >> I'm just thinking of someone reading the test file and looking at all the >> directives. BOUNDARY conveys a spatial meaning and it's easy to intuitively >> remember what its semantics are. My opposition to LABEL was because LABEL >> conveyed no such meaning and I think it would be confusing. As for BOUND vs. >> BOUNDARY, that's really a minor issue and perhaps my knowledge of English >> fails me here, but I'd be happy to hear the reasoning. >> >> Eli >> >> >> >> >> >>> >>> >>> On Thu, Jul 11, 2013 at 12:40 PM, Stephen Lin >>> wrote: >>> > Thanks Owen; Andy (Trick) off-list says he thinks it's a good idea, too. >>> > >>> > Eli B. (also off-list) thinks that the documentation can be approved >>> > and also suggests that the name CHECK-BOUNDARY is better. Anyone else >>> > have an opinion? >>> > >>> > I much prefer CHECK-LABEL to CHECK-BOUNDARY myself, but I am willing >>> > to paint the bike shed whatever color others can agree on. >>> > >>> > Stephen >>> > >>> > On Thu, Jul 11, 2013 at 12:31 PM, Owen Anderson >>> > wrote: >>> >> I'm not familiar enough with the FileCheck internals to comment on the >>> >> implementation, but I *really* like this feature. I've spent way too much >>> >> time over the years tracking down cryptic FileCheck errors that would have >>> >> been solved by this. >>> >> >>> >> --Owen >>> >> >>> >> On Jul 11, 2013, at 10:50 AM, Stephen Lin >>> >> wrote: >>> >> >>> >>> Hi, >>> >>> >>> >>> Can anyone review this patch? It adds a new directive type called >>> >>> "CHECK-LABEL" to FileCheck... >>> >>> >>> >>> If present in a match file, FileCheck will use these directives to >>> >>> split the input into blocks that are independently processed, ensuring >>> >>> that a CHECK does not inadvertently match a line in a different block >>> >>> (which can lead to a misleading/useless error message when the error >>> >>> is eventually caught). Also, FileCheck can now recover from errors >>> >>> within blocks by continuing to the next block. >>> >>> >>> >>> As an example, I purposely introduced the a switch fall-through bug in >>> >>> the last patch I submitted to llvm-commits ("Allow FMAs in safe math >>> >>> mode in some cases when one operand of the fmul is either exactly 0.0 >>> >>> or exactly 1.0")... >>> >>> >>> >>> Bug diff: >>> >>> >>> >>> diff --git a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>> >>> b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>> >>> index 0290afc..239b119 100644 >>> >>> --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>> >>> +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>> >>> @@ -5791,7 +5791,7 @@ static bool isExactlyZeroOrOne(const >>> >>> TargetLowering &TLI, const SDValue &Op) { >>> >>> continue; >>> >>> } >>> >>> } >>> >>> - break; >>> >>> +// break; >>> >>> case ISD::FADD: >>> >>> if (ConstantFPSDNode *V0CFP = >>> >>> dyn_cast(V->getOperand(0))) { >>> >>> >>> >>> The single error message without CHECK-LABEL is: >>> >>> >>> >>> ; CHECK-SAFE: test_add_8 >>> >>> ^ >>> >>> :125:2: note: scanning from here >>> >>> .cfi_endproc >>> >>> ^ >>> >>> :127:10: note: possible intended match here >>> >>> .globl _test_add_10 >>> >>> ^ >>> >>> >>> >>> The error messages with CHECK-LABEL are: >>> >>> >>> >>> ; CHECK-SAFE: vmulsd >>> >>> ^ >>> >>> :87:2: note: scanning from here >>> >>> .align 4, 0x90 >>> >>> ^ >>> >>> :95:2: note: possible intended match here >>> >>> vsubsd %xmm0, %xmm3, %xmm0 >>> >>> ^ >>> >>> fp-contract.ll:118:15: error: expected string not found in input >>> >>> ; CHECK-SAFE: vmulsd >>> >>> ^ >>> >>> :102:2: note: scanning from here >>> >>> .align 4, 0x90 >>> >>> ^ >>> >>> :109:2: note: possible intended match here >>> >>> vsubsd %xmm2, %xmm3, %xmm2 >>> >>> ^ >>> >>> fp-contract.ll:288:15: error: expected string not found in input >>> >>> ; CHECK-SAFE: vmulsd >>> >>> ^ >>> >>> :258:2: note: scanning from here >>> >>> .align 4, 0x90 >>> >>> ^ >>> >>> :266:2: note: possible intended match here >>> >>> vsubsd %xmm0, %xmm3, %xmm0 >>> >>> ^ >>> >>> >>> >>> The three error messages in the CHECK-LABEL case exactly pinpoint the >>> >>> source lines of the actual problem in three separate blocks; the >>> >>> single error message given without CHECK-LABEL is (imho) much less >>> >>> useful. >>> >>> >>> >>> (In this case, the non-CHECK-LABEL version happens to error on the on >>> >>> a label line, so the user could presume that the error happened in the >>> >>> block immediately before test_add_8, which is correct, but in general >>> >>> this might not be true; the only thing that can be concluded is that >>> >>> the error happened sometime before test_add_8.) >>> >>> >>> >>> Please let me know if you have any feedback. >>> >>> >>> >>> Stephen >>> >>> >>> >>> ---------- Forwarded message ---------- >>> >>> From: Stephen Lin >>> >>> Date: Mon, Jun 10, 2013 at 4:21 PM >>> >>> Subject: [PATCH] Add CHECK-LABEL directive to FileCheck to allow more >>> >>> accurate error messages and error recovery >>> >>> To: llvm-commits at cs.uiuc.edu >>> >>> >>> >>> >>> >>> Actually, I went ahead and renamed it CHECK-LABEL and rebased, since I >>> >>> think it’s better :) >>> >>> -Stephen >>> >>> _______________________________________________ >>> >>> llvm-commits mailing list >>> >>> llvm-commits at cs.uiuc.edu >>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >>> >> >>> >> >>> >> _______________________________________________ >>> >> llvm-commits mailing list >>> >> llvm-commits at cs.uiuc.edu >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >> >> From dmitry at kernelgen.org Thu Jul 11 15:51:01 2013 From: dmitry at kernelgen.org (Dmitry Mikushin) Date: Fri, 12 Jul 2013 00:51:01 +0200 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions Message-ID: Dear all, I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. Could you please help finding answers to the following questions: 1) Is there actually a 32-bit mode for MIC? 32-bit ELFs are not recognized, so... 2) MIC ISA is 32-bit ISA (no SSE/MMX) plus 256-bit AVX-like vectors? 3) If 1 is "no" and 2 is "yes", then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers (e.g. return double value) even in scalar programs? Thanks, - D. === $ cat hello.c #include int main() { printf("Hello, Intel MIC!\n"); return 0; } $ PATH=$PATH:~/rpmbuild/CHROOT/opt/kernelgen/usr/bin clang -emit-llvm -c hello.c -o - | PATH=$PATH:~/forge/llvm/install/bin/ opt -O3 -S -o hello.ll $ cat hello.ll ; ModuleID = '' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @str = private unnamed_addr constant [18 x i8] c"Hello, Intel MIC!\00" ; Function Attrs: nounwind uwtable define i32 @main() #0 { entry: %puts = tail call i32 @puts(i8* getelementptr inbounds ([18 x i8]* @str, i64 0, i64 0)) ret i32 0 } ; Function Attrs: nounwind declare i32 @puts(i8* nocapture) #1 attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { nounwind } $ PATH=$PATH:~/forge/llvm/install/bin/ llc hello.ll -march=k1om64 -filetype=obj -o hello.mic.o $ objdump -d hello.mic.o hello.mic.o: file format elf64-k1om Disassembly of section .text: 0000000000000000
: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: bf 00 00 00 00 mov $0x0,%edi 9: e8 00 00 00 00 callq e e: 31 c0 xor %eax,%eax 10: 5d pop %rbp 11: c3 retq $ icc -mmic hello.mic.o -o hello x86_64-k1om-linux-ld: error in hello.mic.o(.eh_frame); no .eh_frame_hdr table will be created. $ /opt/intel/mic/bin/micnativeloadex ./hello Hello, Intel MIC! -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm.k1om.patch Type: application/octet-stream Size: 21451 bytes Desc: not available URL: From chris.matthews at apple.com Thu Jul 11 17:00:38 2013 From: chris.matthews at apple.com (Chris Matthews) Date: Thu, 11 Jul 2013 17:00:38 -0700 Subject: [LLVMdev] John the Ripper in the test suite? Message-ID: I am looking at adding some tests based on John the Ripper to the test suite repository. http://www.openwall.com/john/ Does anyone have a problem with this? Are there specific algorithms people would like to see benchmarked? Thx Chris Matthews chris.matthews@.com (408) 783-6335 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Thu Jul 11 17:29:39 2013 From: rkotler at mips.com (Reed Kotler) Date: Thu, 11 Jul 2013 17:29:39 -0700 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: Message-ID: <51DF4DF3.1030301@mips.com> Be careful about license issues. I.e. gpl. On 07/11/2013 05:00 PM, Chris Matthews wrote: > I am looking at adding some tests based on John the Ripper to the test > suite repository. > > http://www.openwall.com/john/ > > Does anyone have a problem with this? > > Are there specific algorithms people would like to see benchmarked? > > Thx > > Chris Matthews > chris.matthews@.com > (408) 783-6335 > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From chris.matthews at apple.com Thu Jul 11 17:35:29 2013 From: chris.matthews at apple.com (Chris Matthews) Date: Thu, 11 Jul 2013 17:35:29 -0700 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: <51DF4DF3.1030301@mips.com> References: <51DF4DF3.1030301@mips.com> Message-ID: Yes, it looks like the core is all GPL2. But they go on a file-by-file basis. Chris Matthews chris.matthews at apple.com (408) 783-6335 On Jul 11, 2013, at 5:29 PM, Reed Kotler wrote: > Be careful about license issues. > > I.e. gpl. > > > On 07/11/2013 05:00 PM, Chris Matthews wrote: >> I am looking at adding some tests based on John the Ripper to the test >> suite repository. >> >> http://www.openwall.com/john/ >> >> Does anyone have a problem with this? >> >> Are there specific algorithms people would like to see benchmarked? >> >> Thx >> >> Chris Matthews >> chris.matthews@.com >> (408) 783-6335 >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Thu Jul 11 17:44:33 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 11 Jul 2013 17:44:33 -0700 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: <51DF4DF3.1030301@mips.com> Message-ID: Note that the test-suite repository has different licensing rules from the rest of the LLVM codebase. Basically anything that's freely redistributable (including GPL code) is allowed. -Eli On Thu, Jul 11, 2013 at 5:35 PM, Chris Matthews wrote: > Yes, it looks like the core is all GPL2. But they go on a file-by-file > basis. > > Chris Matthews > chris.matthews at apple.com > (408) 783-6335 > > On Jul 11, 2013, at 5:29 PM, Reed Kotler wrote: > > Be careful about license issues. > > I.e. gpl. > > > On 07/11/2013 05:00 PM, Chris Matthews wrote: > > I am looking at adding some tests based on John the Ripper to the test > suite repository. > > http://www.openwall.com/john/ > > Does anyone have a problem with this? > > Are there specific algorithms people would like to see benchmarked? > > Thx > > Chris Matthews > chris.matthews@.com > (408) 783-6335 > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From nlewycky at google.com Thu Jul 11 17:45:58 2013 From: nlewycky at google.com (Nick Lewycky) Date: Thu, 11 Jul 2013 17:45:58 -0700 Subject: [LLVMdev] design for an accurate ODR-checker with clang Message-ID: Hi! A few of us over at Google think a nice feature in clang would be ODR violation checking, and we thought for a while about how to do this and wrote it down, but we aren't actively working on it at the moment nor plan to in the near future. I'm posting this to share our design and hopefully save anyone else the design work if they're interested in it. For some background, C++'s ODR rule roughly means that two definitions of the same symbol must come from "the same tokens with the same interpretation". Given the same token stream, the interpretation can be different due to different name lookup results, or different types through typedefs or using declarations, or due to a different point of instantiation in two translation units. Unlike existing approaches (the ODR checker in the gold linker for example), clang lets us do this with no false positives and very few false negatives. The basis of the idea is that we produce a hash of all the ODR-relevant pieces, and to try to pick the largest possible granularity. By granularity I mean that we would hash the entire definition of a class including all methods defined lexically inline and emit a single value for that class. The first step is to build a new visitor over the clang AST that calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work here because it includes pointers addresses which will be different across different translation units.) Hash the outermost declaration with external-linkage. For example, given a class with a method defined inline, we start the visitor at the class, not at the method. The entirety of the class must be ODR-equivalent across two translation units, including any inline methods. Although the standard mentions that the tokens must be the same, we do not actually include the tokens in the hash. The structure of the AST includes everything about the code which is semantically relevant. Any false positives that would be fixed by hashing the tokens either do not impact the behaviour of the program or could be fixed by hashing more of the AST. References to globals should be hashed by name, but references to locals should be hashed by an ordinal number. Instantiated templates are also visited by the hashing visitor. If we did not, we would have false negatives where the code is not conforming due to different points of instantiation in two translation units. We can skip uninstantiated templates since they don’t affect the behaviour of the program, and we need to visit the instantiations regardless. In LLVM IR, create a new named metadata node !llvm.odr_checking which contains a list of pairs. The names do not necessarily correspond to symbols, for instance, a class will have a hash value but does not have a corresponding symbol. For ease of implementation, names should be mangled per the C++ Itanium ABI (demanglable with c++filt -t). Merging modules that contain these will need to do ODR checking as part of that link, and the resulting module will have the union of these tables. In the .o file, emit a sorted table of in a non-loadable section intended to be read by the linker. All entries in the table must be checked if any symbol from this .o file is involved in the link (note that there is no mapping from symbol to odr table name). If two .o files contain different hash values for the same name, we have detected an ODR violation and issue a diagnostic. Finally, teach the loader (RuntimeDyld) to do verification and catch ODR violations when dlopen'ing a shared library. Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Thu Jul 11 18:02:28 2013 From: rjmccall at apple.com (John McCall) Date: Thu, 11 Jul 2013 18:02:28 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: References: Message-ID: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: > Hi! A few of us over at Google think a nice feature in clang would be ODR violation checking, and we thought for a while about how to do this and wrote it down, but we aren't actively working on it at the moment nor plan to in the near future. I'm posting this to share our design and hopefully save anyone else the design work if they're interested in it. > > For some background, C++'s ODR rule roughly means that two definitions of the same symbol must come from "the same tokens with the same interpretation". Given the same token stream, the interpretation can be different due to different name lookup results, or different types through typedefs or using declarations, or due to a different point of instantiation in two translation units. > > Unlike existing approaches (the ODR checker in the gold linker for example), clang lets us do this with no false positives and very few false negatives. The basis of the idea is that we produce a hash of all the ODR-relevant pieces, and to try to pick the largest possible granularity. By granularity I mean that we would hash the entire definition of a class including all methods defined lexically inline and emit a single value for that class. > > The first step is to build a new visitor over the clang AST that calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work here because it includes pointers addresses which will be different across different translation units.) Hash the outermost declaration with external-linkage. For example, given a class with a method defined inline, we start the visitor at the class, not at the method. The entirety of the class must be ODR-equivalent across two translation units, including any inline methods. > > Although the standard mentions that the tokens must be the same, we do not actually include the tokens in the hash. The structure of the AST includes everything about the code which is semantically relevant. Any false positives that would be fixed by hashing the tokens either do not impact the behaviour of the program or could be fixed by hashing more of the AST. References to globals should be hashed by name, but references to locals should be hashed by an ordinal number. > > Instantiated templates are also visited by the hashing visitor. If we did not, we would have false negatives where the code is not conforming due to different points of instantiation in two translation units. We can skip uninstantiated templates since they don’t affect the behaviour of the program, and we need to visit the instantiations regardless. > > In LLVM IR, create a new named metadata node !llvm.odr_checking which contains a list of pairs. The names do not necessarily correspond to symbols, for instance, a class will have a hash value but does not have a corresponding symbol. For ease of implementation, names should be mangled per the C++ Itanium ABI (demanglable with c++filt -t). Merging modules that contain these will need to do ODR checking as part of that link, and the resulting module will have the union of these tables. > > In the .o file, emit a sorted table of in a non-loadable section intended to be read by the linker. All entries in the table must be checked if any symbol from this .o file is involved in the link (note that there is no mapping from symbol to odr table name). If two .o files contain different hash values for the same name, we have detected an ODR violation and issue a diagnostic. > > Finally, teach the loader (RuntimeDyld) to do verification and catch ODR violations when dlopen'ing a shared library. This is the right basic design, but I'm curious why you're suggesting that the payload should just be a hash instead of an arbitrary string. This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it. Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit. It would be informed by actual symbol visibility, of course. You should expect that there may be multiple hashing schemes (or versions thereof) in play and therefore build a simple prefixing scheme on your ODR symbols. John. From nlewycky at google.com Thu Jul 11 18:13:49 2013 From: nlewycky at google.com (Nick Lewycky) Date: Thu, 11 Jul 2013 18:13:49 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> Message-ID: On 11 July 2013 18:02, John McCall wrote: > On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: > > Hi! A few of us over at Google think a nice feature in clang would be > ODR violation checking, and we thought for a while about how to do this and > wrote it down, but we aren't actively working on it at the moment nor plan > to in the near future. I'm posting this to share our design and hopefully > save anyone else the design work if they're interested in it. > > > > For some background, C++'s ODR rule roughly means that two definitions > of the same symbol must come from "the same tokens with the same > interpretation". Given the same token stream, the interpretation can be > different due to different name lookup results, or different types through > typedefs or using declarations, or due to a different point of > instantiation in two translation units. > > > > Unlike existing approaches (the ODR checker in the gold linker for > example), clang lets us do this with no false positives and very few false > negatives. The basis of the idea is that we produce a hash of all the > ODR-relevant pieces, and to try to pick the largest possible granularity. > By granularity I mean that we would hash the entire definition of a class > including all methods defined lexically inline and emit a single value for > that class. > > > > The first step is to build a new visitor over the clang AST that > calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler > doesn’t work here because it includes pointers addresses which will be > different across different translation units.) Hash the outermost > declaration with external-linkage. For example, given a class with a method > defined inline, we start the visitor at the class, not at the method. The > entirety of the class must be ODR-equivalent across two translation units, > including any inline methods. > > > > Although the standard mentions that the tokens must be the same, we do > not actually include the tokens in the hash. The structure of the AST > includes everything about the code which is semantically relevant. Any > false positives that would be fixed by hashing the tokens either do not > impact the behaviour of the program or could be fixed by hashing more of > the AST. References to globals should be hashed by name, but references to > locals should be hashed by an ordinal number. > > > > Instantiated templates are also visited by the hashing visitor. If we > did not, we would have false negatives where the code is not conforming due > to different points of instantiation in two translation units. We can skip > uninstantiated templates since they don’t affect the behaviour of the > program, and we need to visit the instantiations regardless. > > > > In LLVM IR, create a new named metadata node !llvm.odr_checking which > contains a list of pairs. The names do not > necessarily correspond to symbols, for instance, a class will have a hash > value but does not have a corresponding symbol. For ease of implementation, > names should be mangled per the C++ Itanium ABI (demanglable with c++filt > -t). Merging modules that contain these will need to do ODR checking as > part of that link, and the resulting module will have the union of these > tables. > > > > In the .o file, emit a sorted table of in a > non-loadable section intended to be read by the linker. All entries in the > table must be checked if any symbol from this .o file is involved in the > link (note that there is no mapping from symbol to odr table name). If two > .o files contain different hash values for the same name, we have detected > an ODR violation and issue a diagnostic. > > > > Finally, teach the loader (RuntimeDyld) to do verification and catch ODR > violations when dlopen'ing a shared library. > > This is the right basic design, but I'm curious why you're suggesting that > the payload should just be a hash instead of an arbitrary string. What are you suggesting goes into this string? > This isn't going to be performant enough to do unconditionally at every > load no matter how much you shrink it. > Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls. I feel maybe I'm not understanding what you meant here? Also, you should have something analogous to symbol visibility as a way to > tell the static linker that something only needs to be ODR-checked within a > linkage unit. It would be informed by actual symbol visibility, of course. > Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right? You should expect that there may be multiple hashing schemes (or versions > thereof) in play and therefore build a simple prefixing scheme on your ODR > symbols. We could put the choice of hashing algorithm in the name of the llvm named metadata node, and in the name of the section in the .o files. Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Thu Jul 11 18:26:04 2013 From: echristo at gmail.com (Eric Christopher) Date: Thu, 11 Jul 2013 18:26:04 -0700 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: Message-ID: Hey thanks. I'd always meant to get that in. :) Might want to check export restrictions as well. On Jul 11, 2013 5:03 PM, "Chris Matthews" wrote: > I am looking at adding some tests based on John the Ripper to the test > suite repository. > > http://www.openwall.com/john/ > > Does anyone have a problem with this? > > Are there specific algorithms people would like to see benchmarked? > > Thx > > Chris Matthews > chris.matthews@.com > (408) 783-6335 > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eirc.lew at gmail.com Thu Jul 11 19:45:05 2013 From: eirc.lew at gmail.com (Eric Lu) Date: Fri, 12 Jul 2013 10:45:05 +0800 Subject: [LLVMdev] How to recognize the declaring code scopes of stack variables Message-ID: Hi, If I want to know where the stack variables are declared? For example, whether it is declared within a loop or not? Like variables a[100] and temp. int a[100]; for( int i = 0; i < N; i++){ int temp; } Can this be done in LLVM IR? Or should be implemented in Clang. Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Thu Jul 11 19:57:24 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 11 Jul 2013 19:57:24 -0700 Subject: [LLVMdev] How to recognize the declaring code scopes of stack variables In-Reply-To: References: Message-ID: Clang can/now emits lifetime intrinsics to mark this information, I believe - but I'm not sure if they'll suit your needs. What are you trying to do with this information? On Thu, Jul 11, 2013 at 7:45 PM, Eric Lu wrote: > Hi, > > If I want to know where the stack variables are declared? For example, > whether it is declared within a loop or not? Like variables a[100] and > temp. > > int a[100]; > for( int i = 0; i < N; i++){ > int temp; > } > > Can this be done in LLVM IR? Or should be implemented in Clang. > > > Thanks! > > > Eric > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From eirc.lew at gmail.com Thu Jul 11 20:00:22 2013 From: eirc.lew at gmail.com (eric.lew) Date: Thu, 11 Jul 2013 20:00:22 -0700 (PDT) Subject: [LLVMdev] Regarding scope information for variable declaration. In-Reply-To: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> References: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> Message-ID: <1373598022502-59268.post@n5.nabble.com> I have the same demand. Have you resolved this problems? if so, would you share me the solution? Best Regards. Eric -- View this message in context: http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59268.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From dblaikie at gmail.com Thu Jul 11 20:23:42 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 11 Jul 2013 20:23:42 -0700 Subject: [LLVMdev] How to recognize the declaring code scopes of stack variables In-Reply-To: References: Message-ID: On Thu, Jul 11, 2013 at 8:19 PM, Eric Lu wrote: > When parallelize the loop with OpenMP like models, I need to know what > variables will be shared among different threads. I imagined you'd want to recognize loops, not scopes. Sounds like a backend kind of optimization type thing, just detecting values that don't escape the loop. (this would catch cases where a variable is declared outside the loop but never actually used in such a way, eg: int i; for (...) { i = ...; ... } // no use of 'i' here > So I want to know whether they are declared in loop scopes. > > > On Fri, Jul 12, 2013 at 10:57 AM, David Blaikie wrote: >> >> Clang can/now emits lifetime intrinsics to mark this information, I >> believe - but I'm not sure if they'll suit your needs. >> >> What are you trying to do with this information? >> >> On Thu, Jul 11, 2013 at 7:45 PM, Eric Lu wrote: >> > Hi, >> > >> > If I want to know where the stack variables are declared? For example, >> > whether it is declared within a loop or not? Like variables a[100] and >> > temp. >> > >> > int a[100]; >> > for( int i = 0; i < N; i++){ >> > int temp; >> > } >> > >> > Can this be done in LLVM IR? Or should be implemented in Clang. >> > >> > >> > Thanks! >> > >> > >> > Eric >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > From dblaikie at gmail.com Thu Jul 11 21:42:26 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 11 Jul 2013 21:42:26 -0700 Subject: [LLVMdev] How to recognize the declaring code scopes of stack variables In-Reply-To: References: Message-ID: On Thu, Jul 11, 2013 at 9:33 PM, Eric Lu wrote: > Hi, David > yes, it is similar to your description. And do you know any methods to do > this in LLVM IR? I don't know the mid-level optimizers especially well - I doubt there's a thing that does exactly what you need - but a combination of existing passes/analyses might be able to tell you what you need. Sorry I can't be more precise, it's just not my area. I imagine it's possibly more a case of treating any values that don't enter phi's after the loop as separate - and then ensuring that the values that do enter phi's after the loop body are appropriately shared. - David > > Thanks! > > Eric > > > On Fri, Jul 12, 2013 at 11:23 AM, David Blaikie wrote: >> >> On Thu, Jul 11, 2013 at 8:19 PM, Eric Lu wrote: >> > When parallelize the loop with OpenMP like models, I need to know what >> > variables will be shared among different threads. >> >> I imagined you'd want to recognize loops, not scopes. Sounds like a >> backend kind of optimization type thing, just detecting values that >> don't escape the loop. (this would catch cases where a variable is >> declared outside the loop but never actually used in such a way, eg: >> >> int i; >> for (...) { >> i = ...; >> ... >> } >> >> // no use of 'i' here >> >> >> >> > So I want to know whether they are declared in loop scopes. >> > >> > >> > On Fri, Jul 12, 2013 at 10:57 AM, David Blaikie >> > wrote: >> >> >> >> Clang can/now emits lifetime intrinsics to mark this information, I >> >> believe - but I'm not sure if they'll suit your needs. >> >> >> >> What are you trying to do with this information? >> >> >> >> On Thu, Jul 11, 2013 at 7:45 PM, Eric Lu wrote: >> >> > Hi, >> >> > >> >> > If I want to know where the stack variables are declared? For >> >> > example, >> >> > whether it is declared within a loop or not? Like variables a[100] >> >> > and >> >> > temp. >> >> > >> >> > int a[100]; >> >> > for( int i = 0; i < N; i++){ >> >> > int temp; >> >> > } >> >> > >> >> > Can this be done in LLVM IR? Or should be implemented in Clang. >> >> > >> >> > >> >> > Thanks! >> >> > >> >> > >> >> > Eric >> >> > >> >> > _______________________________________________ >> >> > LLVM Developers mailing list >> >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > >> > >> > > > From klimek at google.com Fri Jul 12 00:27:42 2013 From: klimek at google.com (Manuel Klimek) Date: Fri, 12 Jul 2013 09:27:42 +0200 Subject: [LLVMdev] [cfe-dev] Phabricator down In-Reply-To: References: Message-ID: And we're back. Now with 20GB more we can fill up with denormalized info about the svn repo :) On Thu, Jul 11, 2013 at 9:27 PM, Manuel Klimek wrote: > Yep, sorry, we ran out of space on the instance's database volume. I'll > update this once we're back up. > > > On Thu, Jul 11, 2013 at 9:07 PM, Justin Holewinski < > justin.holewinski at gmail.com> wrote: > >> Is Phabricator down again? I'm starting to see the following: >> >> [Rendering Exception] Multiple exceptions during processing and rendering. >> - AphrontQueryConnectionException: Attempt to connect to phabricator at localhost failed with error #1040: Too many connections. >> - InvalidArgumentException: Argument 1 passed to AphrontView::setUser() must be an instance of PhabricatorUser, null given, called in /srv/http/phabricator/src/view/page/PhabricatorStandardPageView.php on line 197 and defined >> >> >> >> >> On Mon, Jul 8, 2013 at 8:17 AM, Manuel Klimek wrote: >> >>> We should be back up - please let me know if anything doesn't work as >>> expected... >>> >>> Cheers, >>> /Manuel >>> >>> >>> On Mon, Jul 8, 2013 at 12:15 PM, Chandler Carruth wrote: >>> >>>> Just as a tiny update, Manuel is actively working on it, but a small >>>> issue has turned into a larger issue... stay tuned... >>>> >>>> >>>> On Sun, Jul 7, 2013 at 10:11 AM, Manuel Klimek wrote: >>>> >>>>> Hi, >>>>> >>>>> unfortunately phab has gone down and I currently don't have access to >>>>> fix it up. I'll work on it first thing tomorrow, so ETA for it getting back >>>>> is roughly 14 hours. >>>>> >>>>> Sorry! >>>>> /Manuel >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >>> >>> >> >> >> -- >> >> Thanks, >> >> Justin Holewinski >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Fri Jul 12 00:58:01 2013 From: atrick at apple.com (Andrew Trick) Date: Fri, 12 Jul 2013 00:58:01 -0700 Subject: [LLVMdev] MI Scheduler vs SD Scheduler? In-Reply-To: <1372800938.86583.YahooMailNeo@web125506.mail.ne1.yahoo.com> References: <1372448337.69444.YahooMailNeo@web125505.mail.ne1.yahoo.com> <1372800938.86583.YahooMailNeo@web125506.mail.ne1.yahoo.com> Message-ID: <3B65A392-1067-412C-9ABC-B4EC10ED3F6F@apple.com> On Jul 2, 2013, at 2:35 PM, Ghassan Shobaki wrote: > Thank you for the answers! We are currently trying to test the MI scheduler. We are using LLVM 3.3 with Dragon Egg 3.3 on an x86-64 machine. So far, we have run one SPEC CPU2006 test with the MI scheduler enabled using the option -fplugin-arg-dragonegg-llvm-option='-enable-misched:true' with -O3. This enables the machine scheduler in addition to the SD scheduler. We have verified this by adding print messages to the source code of both schedulers. In terms of correctness, enabling the MI scheduler did not cause any failure. However, in terms of performance, we have seen a mix of small positive and negative differences with the geometric mean difference being near zero. The maximum improvement that we have seen is 3% on the Gromacs benchmark. Is this consistent with your test results? I haven’t benchmarked fortran. On x86-64, I regularly see wild swings in performance, 10-20% for small codegen changes (small benchmarks with a primary hot loop). This is not a natural consequence of scheduling, unless spill code changed in the hot loop (rare on x86-64). Quite often, a somewhat random change in copy coalescing results in different register allocation and code layout. The results are chaotic and very platform (linker) and microarchitecture specific. Large benchmarks are immune to wild swings, but the small changes you see could just be the accumulation of chaotic behavior of individual loops. It’s hard for me to draw conclusions without looking at hardware counters and isolating the data to individual loops. The MI scheduler’s generic heuristics are much more about avoiding worst-case scheduling in pathological situations (very large unrolled loops) than it is about tuning for a microarchitecture. People who want to do that may want to plugin their own scheduling strategy. The precise machine model and register pressure information is all there now. The broadest statement I can make is that we should not unnecessarily spill within loops (with rare exceptions). If you see that, file a bug. I know there are still situations that we don’t handle well, but haven’t had a compelling enough reason to add the complexity to the generic heuristics. If good test cases come in, then I’ll do that. > We have then tried to run a test in which the MI scheduler is enabled but the SD scheduler is disabled (or neutralized) by adding the option: -fplugin-arg-dragonegg-llvm-option='-pre-RA-sched:source' to the flags that we have used in the first test. However, this did not work; we got the following error message: > > GCC_4.6.4_DIR/install/bin/gcc -c -o lbm.o -DSPEC_CPU -DNDEBUG -O3 -march=core2 -mtune=core2 -fplugin='DRAGON_EGG_DIR/dragonegg.so' -fplugin-arg-dragonegg-llvm-option='-enable-misched:true' -fplugin-arg-dragonegg-llvm-option='-pre-RA-sched:source' -DSPEC_CPU_LP64 lbm.c > cc1: for the -pre-RA-sched option: may only occur zero or one times! > specmake: *** [lbm.o] Error 1 > > What does this message mean? > Is this a bug or we are doing something wrong? I’m not sure why the driver is telling you this. Maybe someone familiar with dragonegg can help? You can always rebuild llvm with the enableMachineScheduler() hook implemented. http://article.gmane.org/gmane.comp.compilers.llvm.devel/63242/match=machinescheduler Then -enable-misched=true/false simply toggles MI Sched without changing anything else. > How can we test the MI scheduler by itself? > Is it interesting to test 3.3 or there are interesting features that were added to the trunk after branching 3.3? In the latter case, we are willing to test the trunk. It doesn’t look like my June checkins made it into 3.3. If you’re enabling MI Sched, and actually evaluating performance of the default heuristics, then it’s best to use trunk. -Andy > > Thanks > > Ghassan Shobaki > Assistant Professor > Department of Computer Science > Princess Sumaya University for Technology > Amman, Jordan > > > From: Andrew Trick > To: Ghassan Shobaki > Cc: "llvmdev at cs.uiuc.edu" > Sent: Monday, July 1, 2013 8:10 PM > Subject: Re: MI Scheduler vs SD Scheduler? > > > Sent from my iPhone > > On Jun 28, 2013, at 2:38 PM, Ghassan Shobaki wrote: > >> Hi, >> >> We are currently in the process of upgrading from LLVM 2.9 to LLVM 3.3. We are working on instruction scheduling (mainly for register pressure reduction). I have been following the llvmdev mailing list and have learned that a machine instruction (MI) scheduler has been implemented to replace (or work with?) the selection DAG (SD) scheduler. However, I could not find any document that describes the new MI scheduler and how it differs from and relates to the SD scheduler. > > MI is now the place to implement any heuristics for profitable scheduling. SD scheduler will be directly replaced by a new pass that orders the DAG as close as it can to IR order. We currently emulate this with -pre-RA-sched=source. > The only thing necessarily different about MI sched is that it runs after reg coalescing and before reg alloc, and maintains live interval analysis. As a result, register pressure tracking is more accurate. It also uses a new target interface for precise register pressure. > MI sched is intended to be a convenient place to implement target specific scheduling. There is a generic implementation that uses standard heuristics to reduce register pressure and balance latency and CPU resources. That is what you currently get when you enable MI sched for x86. > The generic heuristics are implemented as a priority function that makes a greedy choice over the ready instructions based on the current pressure and the resources and latency of the scheduled and unscheduled set of instructions. > An DAG subtree analysis also exists (ScheduleDFS), which can be used for register pressure avoidance. This isn't hooked up to the generic heuristics yet for lack of interesting test cases. > >> So, I would appreciate any pointer to a document (or a blog) that may help us understand the difference and the relation between the two schedulers and figure out how to deal with them. We are trying to answer the following questions: >> >> - A comment at the top of the file ScheduleDAGInstrs says that this file implements re-scheduling of machine instructions. So, what does re-scheduling mean? > > Rescheduling just means optional scheduling. That's really what the comment should say. It's important to know that MI sched can be skipped for faster compilation. > >> Does it mean that the real scheduling algorithms (such as reg pressure reduction) are currently implemented in the SD scheduler, while the MI scheduler does some kind of complementary work (fine tuning) at a lower level representation of the code? >> And what's the future plan? Is it to move the real scheduling algorithms into the MI scheduler and get rid of the SD scheduler? Will that happen in 3.4 or later? > > I would like to get rid of the SD scheduler so we can reduce compile time by streamline the scheduling data structures and interfaces. There may be some objection to doing that in 3.4 if projects haven't been able to migrate. It will be deprecated though. > >> >> - Based on our initial investigation of the default behavior at -O3 on x86-64, it appears that the SD scheduler is called while the MI scheduler is not. That's consistent with the above interpretation of re-scheduling, but I'd appreciate any advice on what we should do at this point. Should we integrate our work (an alternate register pressure reduction scheduler) into the SD scheduler or the MI scheduler? > > Please refer to my recent messages on llvmdev regarding enabling MI scheduling by default on x86. > http://article.gmane.org/gmane.comp.compilers.llvm.devel/63242/match=machinescheduler > > I suggest integrating with the MachineScheduler pass. > There are many places to plug in. MachineSchedRegistry provides the hook. At that point you can define your own ScheduleDAGInstrs or ScheduleDAGMI subclass. People who only want to define new heuristics should reuse ScheduleDAGMI directly and only define their own MachineSchedStrategy. > >> >> - Our SPEC testing on x86-64 has shown a significant performance improvement of LLVM 3.3 relative to LLVM 2.9 (about 5% in geomean on INT2006 and 15% in geomean on FP2006), but our spill code measurements have shown that LLVM 3.3 generates significantly more spill code on most benchmarks. We will be doing more investigation on this, but are there any known facts that explain this behavior? Is this caused by a known regression in scheduling and/or allocation (which I doubt) or by the implementation (or enabling) of some new optimization(s) that naturally increase(s) register pressure? >> > There is not a particular known regression. It's not surprising that optimizations increase pressure. > > Andy > >> Thank you in advance! >> >> Ghassan Shobaki >> Assistant Professor >> Department of Computer Science >> Princess Sumaya University for Technology >> Amman, Jordan -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton at korobeynikov.info Fri Jul 12 01:31:45 2013 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Fri, 12 Jul 2013 12:31:45 +0400 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: Message-ID: > Hey thanks. I'd always meant to get that in. :) > > Might want to check export restrictions as well. Right. This was the reason why OpenSSL was removed from the testsuite. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From criswell at illinois.edu Fri Jul 12 07:35:14 2013 From: criswell at illinois.edu (John Criswell) Date: Fri, 12 Jul 2013 09:35:14 -0500 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: Message-ID: <51E01422.1060501@illinois.edu> On 7/12/13 3:31 AM, Anton Korobeynikov wrote: >> Hey thanks. I'd always meant to get that in. :) >> >> Might want to check export restrictions as well. > Right. This was the reason why OpenSSL was removed from the testsuite. Agreed. Please do not add any crypto stuff to the test suite. The regulations on crypto are unclear and have, in the past, been subject to change. Hosting crypto is a headache that I don't want to deal with. -- John T. > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From t.p.northover at gmail.com Fri Jul 12 07:42:41 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Fri, 12 Jul 2013 15:42:41 +0100 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: <51E01422.1060501@illinois.edu> References: <51E01422.1060501@illinois.edu> Message-ID: > Agreed. Please do not add any crypto stuff to the test suite. The > regulations on crypto are unclear and have, in the past, been subject to > change. Hosting crypto is a headache that I don't want to deal with. Do you know if we've got much that's specifically intended as a proxy for crypto? It's quite an important use-case in the real world. Tim. From swlin at post.harvard.edu Fri Jul 12 07:53:24 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 12 Jul 2013 07:53:24 -0700 Subject: [LLVMdev] Bikeshedding a name for new directive: CHECK-LABEL vs. CHECK-BOUNDARY vs. something else. In-Reply-To: References: Message-ID: OK, it was two votes to one so I went with CHECK-LABEL, r186162. On Thu, Jul 11, 2013 at 1:41 PM, Stephen Lin wrote: > Hi, > > I would like to add a new directive to FileCheck called CHECK-FOO > (where FOO is a name under discussion right now) which is used to > improve error messages. The idea is that you would use CHECK-FOO on > any line that contains a unique identifier (typically labels, function > definitions, etc.) that is guaranteed to only occur once in the file; > FileCheck will then conceptually break the break the input into blocks > separated by these unique identifier lines and perform all other > checks localized to between the appropriate blocks; it can ever > recover from an error in one block and move on to another. > > As an example, I purposely introduced the a switch fall-through bug in > the last patch I submitted to llvm-commits ("Allow FMAs in safe math > mode in some cases when one operand of the fmul is either exactly 0.0 > or exactly 1.0")... > > Bug diff: > > diff --git a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp > b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp > index 0290afc..239b119 100644 > --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp > +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp > @@ -5791,7 +5791,7 @@ static bool isExactlyZeroOrOne(const > TargetLowering &TLI, const SDValue &Op) { > continue; > } > } > - break; > +// break; > case ISD::FADD: > if (ConstantFPSDNode *V0CFP = > dyn_cast(V->getOperand(0))) { > > The single error message without CHECK-FOO is: > > ; CHECK: test_add_8 > ^ > :125:2: note: scanning from here > .cfi_endproc > ^ > :127:10: note: possible intended match here > .globl _test_add_10 > ^ > > The error messages with CHECK-FOO on the function name label lines are: > > ; CHECK: vmulsd > ^ > :87:2: note: scanning from here > .align 4, 0x90 > ^ > :95:2: note: possible intended match here > vsubsd %xmm0, %xmm3, %xmm0 > ^ > fp-contract.ll:118:15: error: expected string not found in input > ; CHECK: vmulsd > ^ > :102:2: note: scanning from here > .align 4, 0x90 > ^ > :109:2: note: possible intended match here > vsubsd %xmm2, %xmm3, %xmm2 > ^ > fp-contract.ll:288:15: error: expected string not found in input > ; CHECK: vmulsd > ^ > :258:2: note: scanning from here > .align 4, 0x90 > ^ > :266:2: note: possible intended match here > vsubsd %xmm0, %xmm3, %xmm0 > ^ > > Does anyone have a suggestions on what FOO should be? In my current > patch it's currently LABEL, but Eli. B. suggested BOUNDARY > > Any opinions or other suggestions? > > Thanks, > Stephen > > On Thu, Jul 11, 2013 at 1:33 PM, Stephen Lin wrote: >> It's just short for BOUNDARY. I think BOUNDARY is too long :D >> I prefer LABEL though. I can send this to the dev list and ask for >> opinions there. >> Stephen >> >> On Thu, Jul 11, 2013 at 12:54 PM, Eli Bendersky wrote: >>> >>> On Thu, Jul 11, 2013 at 12:44 PM, Stephen Lin >>> wrote: >>>> >>>> Actually, I would be ok with CHECK-BOUND as well. >>>> Eli, is that OK to you? And does anyone else want to chime in? >>>> I will expand the docs either way. >>>> Thanks, >>>> Stephen >>> >>> >>> I'm not sure what BOUND means in this case? And how is it different from >>> BOUNDARY? >>> >>> I'm just thinking of someone reading the test file and looking at all the >>> directives. BOUNDARY conveys a spatial meaning and it's easy to intuitively >>> remember what its semantics are. My opposition to LABEL was because LABEL >>> conveyed no such meaning and I think it would be confusing. As for BOUND vs. >>> BOUNDARY, that's really a minor issue and perhaps my knowledge of English >>> fails me here, but I'd be happy to hear the reasoning. >>> >>> Eli >>> >>> >>> >>> >>> >>>> >>>> >>>> On Thu, Jul 11, 2013 at 12:40 PM, Stephen Lin >>>> wrote: >>>> > Thanks Owen; Andy (Trick) off-list says he thinks it's a good idea, too. >>>> > >>>> > Eli B. (also off-list) thinks that the documentation can be approved >>>> > and also suggests that the name CHECK-BOUNDARY is better. Anyone else >>>> > have an opinion? >>>> > >>>> > I much prefer CHECK-LABEL to CHECK-BOUNDARY myself, but I am willing >>>> > to paint the bike shed whatever color others can agree on. >>>> > >>>> > Stephen >>>> > >>>> > On Thu, Jul 11, 2013 at 12:31 PM, Owen Anderson >>>> > wrote: >>>> >> I'm not familiar enough with the FileCheck internals to comment on the >>>> >> implementation, but I *really* like this feature. I've spent way too much >>>> >> time over the years tracking down cryptic FileCheck errors that would have >>>> >> been solved by this. >>>> >> >>>> >> --Owen >>>> >> >>>> >> On Jul 11, 2013, at 10:50 AM, Stephen Lin >>>> >> wrote: >>>> >> >>>> >>> Hi, >>>> >>> >>>> >>> Can anyone review this patch? It adds a new directive type called >>>> >>> "CHECK-LABEL" to FileCheck... >>>> >>> >>>> >>> If present in a match file, FileCheck will use these directives to >>>> >>> split the input into blocks that are independently processed, ensuring >>>> >>> that a CHECK does not inadvertently match a line in a different block >>>> >>> (which can lead to a misleading/useless error message when the error >>>> >>> is eventually caught). Also, FileCheck can now recover from errors >>>> >>> within blocks by continuing to the next block. >>>> >>> >>>> >>> As an example, I purposely introduced the a switch fall-through bug in >>>> >>> the last patch I submitted to llvm-commits ("Allow FMAs in safe math >>>> >>> mode in some cases when one operand of the fmul is either exactly 0.0 >>>> >>> or exactly 1.0")... >>>> >>> >>>> >>> Bug diff: >>>> >>> >>>> >>> diff --git a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>>> >>> b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>>> >>> index 0290afc..239b119 100644 >>>> >>> --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>>> >>> +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >>>> >>> @@ -5791,7 +5791,7 @@ static bool isExactlyZeroOrOne(const >>>> >>> TargetLowering &TLI, const SDValue &Op) { >>>> >>> continue; >>>> >>> } >>>> >>> } >>>> >>> - break; >>>> >>> +// break; >>>> >>> case ISD::FADD: >>>> >>> if (ConstantFPSDNode *V0CFP = >>>> >>> dyn_cast(V->getOperand(0))) { >>>> >>> >>>> >>> The single error message without CHECK-LABEL is: >>>> >>> >>>> >>> ; CHECK-SAFE: test_add_8 >>>> >>> ^ >>>> >>> :125:2: note: scanning from here >>>> >>> .cfi_endproc >>>> >>> ^ >>>> >>> :127:10: note: possible intended match here >>>> >>> .globl _test_add_10 >>>> >>> ^ >>>> >>> >>>> >>> The error messages with CHECK-LABEL are: >>>> >>> >>>> >>> ; CHECK-SAFE: vmulsd >>>> >>> ^ >>>> >>> :87:2: note: scanning from here >>>> >>> .align 4, 0x90 >>>> >>> ^ >>>> >>> :95:2: note: possible intended match here >>>> >>> vsubsd %xmm0, %xmm3, %xmm0 >>>> >>> ^ >>>> >>> fp-contract.ll:118:15: error: expected string not found in input >>>> >>> ; CHECK-SAFE: vmulsd >>>> >>> ^ >>>> >>> :102:2: note: scanning from here >>>> >>> .align 4, 0x90 >>>> >>> ^ >>>> >>> :109:2: note: possible intended match here >>>> >>> vsubsd %xmm2, %xmm3, %xmm2 >>>> >>> ^ >>>> >>> fp-contract.ll:288:15: error: expected string not found in input >>>> >>> ; CHECK-SAFE: vmulsd >>>> >>> ^ >>>> >>> :258:2: note: scanning from here >>>> >>> .align 4, 0x90 >>>> >>> ^ >>>> >>> :266:2: note: possible intended match here >>>> >>> vsubsd %xmm0, %xmm3, %xmm0 >>>> >>> ^ >>>> >>> >>>> >>> The three error messages in the CHECK-LABEL case exactly pinpoint the >>>> >>> source lines of the actual problem in three separate blocks; the >>>> >>> single error message given without CHECK-LABEL is (imho) much less >>>> >>> useful. >>>> >>> >>>> >>> (In this case, the non-CHECK-LABEL version happens to error on the on >>>> >>> a label line, so the user could presume that the error happened in the >>>> >>> block immediately before test_add_8, which is correct, but in general >>>> >>> this might not be true; the only thing that can be concluded is that >>>> >>> the error happened sometime before test_add_8.) >>>> >>> >>>> >>> Please let me know if you have any feedback. >>>> >>> >>>> >>> Stephen >>>> >>> >>>> >>> ---------- Forwarded message ---------- >>>> >>> From: Stephen Lin >>>> >>> Date: Mon, Jun 10, 2013 at 4:21 PM >>>> >>> Subject: [PATCH] Add CHECK-LABEL directive to FileCheck to allow more >>>> >>> accurate error messages and error recovery >>>> >>> To: llvm-commits at cs.uiuc.edu >>>> >>> >>>> >>> >>>> >>> Actually, I went ahead and renamed it CHECK-LABEL and rebased, since I >>>> >>> think it’s better :) >>>> >>> -Stephen >>>> >>> _______________________________________________ >>>> >>> llvm-commits mailing list >>>> >>> llvm-commits at cs.uiuc.edu >>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> llvm-commits mailing list >>>> >> llvm-commits at cs.uiuc.edu >>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >>> >>> From david.tweed at gmail.com Fri Jul 12 08:07:58 2013 From: david.tweed at gmail.com (David Tweed) Date: Fri, 12 Jul 2013 16:07:58 +0100 Subject: [LLVMdev] John the Ripper in the test suite? In-Reply-To: References: <51E01422.1060501@illinois.edu> Message-ID: On Fri, Jul 12, 2013 at 3:42 PM, Tim Northover wrote: > > Agreed. Please do not add any crypto stuff to the test suite. The > > regulations on crypto are unclear and have, in the past, been subject to > > change. Hosting crypto is a headache that I don't want to deal with. > > Do you know if we've got much that's specifically intended as a proxy > for crypto? It's quite an important use-case in the real world. > There's the various signing routines in MiBench in the testsuite. They aren't encrypting anything, just verifying integrity. -- cheers, dave tweed__________________________ high-performance computing and machine vision expert: david.tweed at gmail.com "while having code so boring anyone can maintain it, use Python." -- attempted insult seen on slashdot -------------- next part -------------- An HTML attachment was scrubbed... URL: From justin.holewinski at gmail.com Fri Jul 12 09:39:30 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Fri, 12 Jul 2013 12:39:30 -0400 Subject: [LLVMdev] Clarification on alloca semantics Message-ID: According to the language reference: *The ‘alloca‘ instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. The object is always allocated in the generic address space (address space zero).* The last sentence specifies where the memory is allocated, but it's not clear precisely what "allocated in the generic address space" means. For architectures with a segmented memory layout, allocating "in" the generic address space may not make sense. Instead, you allocate in a specific address space, and create a "generic" pointer to that allocation. Is this a legal interpretation of the language reference? If so, I would like to make the text in the language reference more explicit that this is allowed. If not, I would like to petition to make this legal. Something along the lines of: *The object is allocated in a target-defined memory space and the returned pointer is always in the generic address space (address space zero).* -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexanius at gmail.com Fri Jul 12 12:08:06 2013 From: alexanius at gmail.com (Alex Markin) Date: Fri, 12 Jul 2013 19:08:06 +0000 Subject: [LLVMdev] Break in loop expression-3 Message-ID: Hello everyone. I've noticed the difference in gcc and llvm behaviour with the following code: $ cat test.c #include int main() { for(int i = 0;; ({break;})) printf("Hello, world\n"); } $ clang test.c -pedantic && ./a.out test.c:5:22: warning: use of GNU statement expression extension [-Wgnu] for(int i = 0;; ({break;})) ^ 1 warning generated. Hello, world $ gcc test.c -std=gnu11 -pedantic && ./a.out test.c: In function 'main': test.c:5:23: error: break statement not within loop or switch for(int i = 0;; ({break;})) ^ test.c:5:21: warning: ISO C forbids braced-groups within expressions [-Wpedantic] for(int i = 0;; ({break;})) I asked gcc developers about that fact, they answered that gcc is correct here, because statement expressions don't inherit the surrounding context. So, it seems to be an llvm bug. Kind regards, Markin Alex From elena.demikhovsky at intel.com Fri Jul 12 13:12:35 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Fri, 12 Jul 2013 20:12:35 +0000 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions In-Reply-To: References: Message-ID: Hello Dmitry, I'm working on KNL backend and plan to push it to the open source once the ISA becomes public. We do not plan to support KNC architecture in open source. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Dmitry Mikushin Sent: Friday, July 12, 2013 01:51 To: LLVM Developers Mailing List Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions Dear all, I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. Could you please help finding answers to the following questions: 1) Is there actually a 32-bit mode for MIC? 32-bit ELFs are not recognized, so... 2) MIC ISA is 32-bit ISA (no SSE/MMX) plus 256-bit AVX-like vectors? 3) If 1 is "no" and 2 is "yes", then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers (e.g. return double value) even in scalar programs? Thanks, - D. === $ cat hello.c #include int main() { printf("Hello, Intel MIC!\n"); return 0; } $ PATH=$PATH:~/rpmbuild/CHROOT/opt/kernelgen/usr/bin clang -emit-llvm -c hello.c -o - | PATH=$PATH:~/forge/llvm/install/bin/ opt -O3 -S -o hello.ll $ cat hello.ll ; ModuleID = '' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @str = private unnamed_addr constant [18 x i8] c"Hello, Intel MIC!\00" ; Function Attrs: nounwind uwtable define i32 @main() #0 { entry: %puts = tail call i32 @puts(i8* getelementptr inbounds ([18 x i8]* @str, i64 0, i64 0)) ret i32 0 } ; Function Attrs: nounwind declare i32 @puts(i8* nocapture) #1 attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { nounwind } $ PATH=$PATH:~/forge/llvm/install/bin/ llc hello.ll -march=k1om64 -filetype=obj -o hello.mic.o $ objdump -d hello.mic.o hello.mic.o: file format elf64-k1om Disassembly of section .text: 0000000000000000
: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: bf 00 00 00 00 mov $0x0,%edi 9: e8 00 00 00 00 callq e e: 31 c0 xor %eax,%eax 10: 5d pop %rbp 11: c3 retq $ icc -mmic hello.mic.o -o hello x86_64-k1om-linux-ld: error in hello.mic.o(.eh_frame); no .eh_frame_hdr table will be created. $ /opt/intel/mic/bin/micnativeloadex ./hello Hello, Intel MIC! --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From silvas at purdue.edu Fri Jul 12 13:38:05 2013 From: silvas at purdue.edu (Sean Silva) Date: Fri, 12 Jul 2013 13:38:05 -0700 Subject: [LLVMdev] Clarification on alloca semantics In-Reply-To: References: Message-ID: Is there any reason why it is saying "generic address space (address space zero)" and not just "address space zero"? -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry at kernelgen.org Fri Jul 12 13:44:00 2013 From: dmitry at kernelgen.org (Dmitry Mikushin) Date: Fri, 12 Jul 2013 22:44:00 +0200 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions In-Reply-To: References: Message-ID: Hello Elena, Thanks for info! Since Knights Landing (KNL) is going to be shipped also in form of host CPU, it will have to have open-source support :) But given that KNL is only announced 1 month ago, we should expect up to 1.5 years for it to become somewhat wide-spread, i.e. 2014-2015. Meanwhile, I still hope to perform some KNC evaluation, so answers to above questions are much appreciated! Best, - D. 2013/7/12 Demikhovsky, Elena : > Hello Dmitry, > > I'm working on KNL backend and plan to push it to the open source once the ISA becomes public. We do not plan to support KNC architecture in open source. > > - Elena > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Dmitry Mikushin > Sent: Friday, July 12, 2013 01:51 > To: LLVM Developers Mailing List > Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions > > Dear all, > > I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. > Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. > > Could you please help finding answers to the following questions: > > 1) Is there actually a 32-bit mode for MIC? 32-bit ELFs are not recognized, so... > 2) MIC ISA is 32-bit ISA (no SSE/MMX) plus 256-bit AVX-like vectors? > 3) If 1 is "no" and 2 is "yes", then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers (e.g. return double value) even in scalar programs? > > Thanks, > - D. > > === > > $ cat hello.c > #include > > int main() > { > printf("Hello, Intel MIC!\n"); > return 0; > } > > $ PATH=$PATH:~/rpmbuild/CHROOT/opt/kernelgen/usr/bin clang -emit-llvm -c hello.c -o - | PATH=$PATH:~/forge/llvm/install/bin/ opt -O3 -S -o hello.ll $ cat hello.ll ; ModuleID = '' > target datalayout = > "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > target triple = "x86_64-unknown-linux-gnu" > > @str = private unnamed_addr constant [18 x i8] c"Hello, Intel MIC!\00" > > ; Function Attrs: nounwind uwtable > define i32 @main() #0 { > entry: > %puts = tail call i32 @puts(i8* getelementptr inbounds ([18 x i8]* @str, i64 0, i64 0)) > ret i32 0 > } > > ; Function Attrs: nounwind > declare i32 @puts(i8* nocapture) #1 > > attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" > "no-infs-fp-math"="false" "no-nans-fp-math"="false" > "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { nounwind } > > $ PATH=$PATH:~/forge/llvm/install/bin/ llc hello.ll -march=k1om64 -filetype=obj -o hello.mic.o $ objdump -d hello.mic.o > > hello.mic.o: file format elf64-k1om > > > Disassembly of section .text: > > 0000000000000000
: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: bf 00 00 00 00 mov $0x0,%edi > 9: e8 00 00 00 00 callq e > e: 31 c0 xor %eax,%eax > 10: 5d pop %rbp > 11: c3 retq > > $ icc -mmic hello.mic.o -o hello > x86_64-k1om-linux-ld: error in hello.mic.o(.eh_frame); no .eh_frame_hdr table will be created. > > $ /opt/intel/mic/bin/micnativeloadex ./hello Hello, Intel MIC! > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > From eli.friedman at gmail.com Fri Jul 12 13:59:09 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Fri, 12 Jul 2013 13:59:09 -0700 Subject: [LLVMdev] Break in loop expression-3 In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 12:08 PM, Alex Markin wrote: > Hello everyone. > > I've noticed the difference in gcc and llvm behaviour with the following code: > > $ cat test.c > #include > int main() > { > for(int i = 0;; ({break;})) > printf("Hello, world\n"); > } > > $ clang test.c -pedantic && ./a.out > test.c:5:22: warning: use of GNU statement expression extension [-Wgnu] > for(int i = 0;; ({break;})) > ^ > 1 warning generated. > Hello, world > > $ gcc test.c -std=gnu11 -pedantic && ./a.out > test.c: In function 'main': > test.c:5:23: error: break statement not within loop or switch > for(int i = 0;; ({break;})) > ^ > test.c:5:21: warning: ISO C forbids braced-groups within expressions > [-Wpedantic] > for(int i = 0;; ({break;})) > > I asked gcc developers about that fact, they answered that gcc is > correct here, because statement expressions don't inherit the > surrounding context. That's simply wrong. > So, it seems to be an llvm bug. This is essentially the same as http://llvm.org/bugs/show_bug.cgi?id=8880 . -Eli From justin.holewinski at gmail.com Fri Jul 12 14:32:00 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Fri, 12 Jul 2013 17:32:00 -0400 Subject: [LLVMdev] Clarification on alloca semantics In-Reply-To: References: Message-ID: Good question. As far as I know, there is no specification that dictates address space 0 is a "generic" address space for the target. On Fri, Jul 12, 2013 at 4:38 PM, Sean Silva wrote: > Is there any reason why it is saying "generic address space (address space > zero)" and not just "address space zero"? > > -- Sean Silva > -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From N.Ojeda.Bar at dpmms.cam.ac.uk Fri Jul 12 09:09:13 2013 From: N.Ojeda.Bar at dpmms.cam.ac.uk (Nicolas Ojeda Bar) Date: Fri, 12 Jul 2013 17:09:13 +0100 Subject: [LLVMdev] setjmp/longjmp exception handling: how? Message-ID: <06A5FD1E-76EB-47C1-B24A-296F35EE3FFD@dpmms.cam.ac.uk> Dear list, I want to add SJLJ exception handling to my frontend. Unfortunately, there doesn't seem to be any examples in the documentation as to how to use the intrinsics @llvm.eh.sjlj.setjmp @llvm.eh.sjlj.longjmp @llvm.eh.sjlj.lsda @llvm.eh.sjlj.callsite Is there a way to force Clang to use SJLJ exception handling for C++? That way I would be able to look at its output to learn how to use them correctly. Otherwise, how should I go about finding out how to implement SJLJ exception handling in my frontend? Thanks! Nicolas From grosbach at apple.com Fri Jul 12 15:25:54 2013 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 12 Jul 2013 15:25:54 -0700 Subject: [LLVMdev] setjmp/longjmp exception handling: how? In-Reply-To: <06A5FD1E-76EB-47C1-B24A-296F35EE3FFD@dpmms.cam.ac.uk> References: <06A5FD1E-76EB-47C1-B24A-296F35EE3FFD@dpmms.cam.ac.uk> Message-ID: I strongly advise you not to use SjLj exception handling. Use zero-cost-exceptions via DWARF instead. If you really want to pursue sjlj anyway, look at the ARM backend. It uses them for darwin targets. -Jim On Jul 12, 2013, at 9:09 AM, Nicolas Ojeda Bar wrote: > Dear list, > > I want to add SJLJ exception handling to my frontend. Unfortunately, > there doesn't seem to be any examples in the documentation as to how > to use the intrinsics > > @llvm.eh.sjlj.setjmp > @llvm.eh.sjlj.longjmp > @llvm.eh.sjlj.lsda > @llvm.eh.sjlj.callsite > > Is there a way to force Clang to use SJLJ exception handling for C++? That > way I would be able to look at its output to learn how to use them correctly. > > Otherwise, how should I go about finding out how to implement SJLJ exception > handling in my frontend? > > Thanks! > Nicolas > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Fri Jul 12 15:49:06 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Fri, 12 Jul 2013 15:49:06 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. Message-ID: <51E087E2.5040101@gmail.com> Hi, There: This is the proposal for parallelizing post-ipo stage. See the following for details. I also attach a toy-grade rudimentary implementation. This implementation can be used to illustrate some concepts here. This patch is not going to be committed. Unfortunately, this weekend I will be too busy to read emails. Please do not construe delayed response as being rude :-). Thanks a lot in advance for your time insightful comments! Shuxin The proposal ------------ It is organized as following: 1) background info, if you heard "/usr/bin/ls", please skip it 2) the motivation of parallelize post-IPO stage 3) how to parallelize post-IPO 4) the linker problems. 5) the toy-grade rudimentary implementation 6) misc 1.Some background ------------------ The Interprocedural-optimization compilation, aka IPO or IPA, typically consists of three stages: S1) pre-IPO Each function goes through some analysis and not-very-aggressive optimizations. Some information is collected during this stage, this info will be to IPO stages. This info is usually called summary info. The result of this stage is "fake-objects" which is binary files using some known object format to encapsulate IR as well as summary info along with other stuff. S2) IPO: Compiler works with linker to resolve and merge symbols in the "fake-objects" Then Interprocedural analyses (IPA) are invoked to perform interprocedural analysis either based on the summary-info, or directly on the IR. Interprocedural optimizations (IPO) are called based on the IPA result. In some compilers, IPA and IPO are separated. One reason is that many IPAs can be directly conduct on the concise summary info, while many IPOs need to load IRs and bulky annotation/metadata into memory. S3) post-IPO: Typically consist of Loop-Nest-Opt, Scalar Opt, Code-Gen etc etc. While they are intra-procedural analyses/optimizers, they may directly benefit from the info collected in the IPO stages and pass down the road. LLVM collectively call S2 and S3 as "LTO CodeGen", which is very confusing. 2. Why parallelize post-IPO stage ============================== R1) To improve the scalarbility It is quite obvious that we are not able to put everything about a monster program in memory at once. Even if you can afford a expensive computer, the address space of a single compiler process cannot accommodate a monster program. R2) to take advantage of ample HW resource to shorten compile time. R3) make debugging lot easier. One can triage problems in a much smaller partition rather than the huge monster program. This proposal is not able to shoot the goal R1 at this moment, because during the IPO stage, currently the compiler brings everything into memory at once. 3. How to parallelize post-IPO stage ==================================== From 5k' high, the concept is very simple, just to step 1).divide the merged IR into small pieces, step 2).and compile each of this pieces independendly. step 3) the objects of each piece are fed back to linker to are linked into an executable, or a dynamic lib. Section 3.1 through 3.3 describe these three steps respectively. 3.1. Partitioning ----------------- Partitioning is to cut a resonabely-sized chunk from the big merged IRs. It roughly consists of two steps, 1) determine the partition scheme, which is relatively easy step, and 2) physically scoop the partition out of the merged IR, which is much more involved. 3.1.1 Figure out Partition scheme ---------------------------------- we randomly pick up some function and put them in a partition. It would be nice to perform some optimization at this moment. One opt in my mind is to reorder functions in order to reduce working-set and improve locality. Unfortunately, this opt seems to be bit blind at this time, because - CallGraph is not annotated with estimated or profiled frequency. - some linkers don't respect the order. It seems they just remembers the function order of the pristine input obj/fake-obj, and enforce this order at final link (link-exec/shared-lib) stage. Anyway, I try to ignore all these problems, and try to perform partition via following steps. Maybe we have some luck on some platforms: o. DFS the call-graph, ignoring the self-resursive edges, if freq is available, prioritizing the edges (i.e. corresponding to call-sites) such that frequent edges are visited first. o. Cut the DFS spanning tree obtained from the previous step bottom-up, Each cut/partition contains reasonable # of functions, and the aggregate size of the functions of the partition should not exceeds predefined threshold. o. repeat the previous step until the Call-graph's DFS spanning tree is empty. 3.1.2 Partition transformation ------------------------------ This is bit involved. There are bunch of problems we have to tackle. 1) When the use/def of a symbol are separated in different modules, its attribute, like linkage, visibility, need to be changed as well. [Example 1], if a symbol is flagged as "internal" to the module where the it is defined, the linkage need to be changed into "internal" to the executable/lib being compiled. [Example 2], For compile-time constants, their initialized value needs to to cloned to the partitions where it is referenced, The rationale is to make the post-ipo passes to take advantage of the initlized value to squeeeze some performance. In order to not bloat the code size, the cloned constant should mark "don't emit". [end of eg2] Being able to precisely update symbols' attribute is not only vital to correctness, it has significant impact to the the performance as well. I have not yet taken a thorough investigation of this issue. My rudimentary implementation is simply flag symbol "external" when its use/def are separated in different module. I believe this is one of the most difficult part of this work. I guess it is going to take long time to become stable. 2) In order to compile each partition in each separate thread (see Section 3.2), we have to put partitions in distinct LLVMContext. I could be wrong, but I don't find the code which is able to perform function cloning across LLVMContext. My workaround in the patch is to perform function cloning in one LLVMContext (but in different Module, of course), then save the module to disk file, and load it to memory using a new LLVMContext. It is bit circuitous and expensive. One random observation: Currently, function-scoped static variables are considered as "global variables". When cloning a function with static variable, compiler has no idea if the static variables are used only by the function being cloned, and hence separate the function and the variables. I guess it would be nice if we organized symbols by its scope instead of its live-time. it would be convenient for this situation. 3.2 Compile partitions independently -------------------------------------- There are two camps: one camp advocate compiling partitions via multi-process, the other one favor multi-thread. Inside Apple compiler teams, I'm the only one belong to the 1st comp. I think while multi-proc sounds bit red-neck, it has its advantage for this purpose, and while multi-thread is certainly more eye-popping, it has its advantage as well. The advantage of multi-proc are: 1) easier to implement, the process run in its own address space. We don't need to worry about they can interfere with each other. 2)huge, or not unlimited, address space. The disadvantage is that it's expensive. But I guess the cost is almost negligible compared to the overall IPO compilation. The advantage of multi-threads I can imagine are: 1) sound fancy 2) it is light-weight 3) inter-thread communication is easier than IPC. Its disadvantage are: 1). Oftentime we will come across race-condition, and it took awful long time to figure it out. While the code is supposed to be mult-thread safe, we might miss some tricky case. Trouble-shooting race condition is a nightmare. 2) Small address space. This is big problem if we the compiler is built 32-bit . In that case, the compiler is not able to bring lots of stuff in memory even if the HW dose provide ample mem. 3) The thread-safe run-time lib is more expensive. I once linked a compiler using -lpthread (I dose not have to) on a UNIX platform, and saw the compiler slow down by about 1/3. I'm not able to convince the folks in other camp, neither are they able to convince me. I decide to implement both. Fortunately, this part is not difficult, it seems to be rather easy to crank out one within short period of time. It would be interesting to compare them side-by-side, and see which camp lose:-). On the other hand, if we run into race-condition problem, we choose multi-proc version as a fall-back. Regardless which tech is going to use to compile partition independently, in order to judiciously and adaptively choose appropriate parallel-factor, the compiler certainly need a lib which is able to figure out the load the entire system is in. I don't know if there are such magic lib or not. 4. the tale of two kinds of linker ---------------------------------- As far as I can tell, llvm suports two kind linker for its IPO compilation, and the supports are embodied by two set of APIs/interfaces. o. Interface 1, those stuff named lto_xxxx(). o. GNU gold interface, The compiler interact with GNU gold via the adapter implemented in tools/gold/gold-plugin.cpp. This adpater calls the interface-1 to control the IPO process. It dose not have to call the interface APIs, I think it is definitely ok it call internal functions. The compiler used to generate a single object file from the merged IR, now it will generate multiple of them, one for each partition. So, the interface 1 is *NOT* sufficient any more. For gold linker users, it is easy to make them happy just by hacking the adapter, informing the linker new input object files. This is done transparently, the users don't need to install new ld. For those system which invoke ld interacting with the libLTO.{so,dylib}, it has to accept the new APIs I added to the interface-1 in order to enable the new functionality. Or maybe we can invoke '/the/path/to/ld -r *.o -o merged.o' and feed the merged.o the linker (this will keep the interface interact)? Unfortunately, it dose not work at all, how can I know the path the ld? the libLTO.{so,dydlib} is invoked as plugin; it cannot see the argv. How about hack them by adding a nasty flag pointing to the right ld? Well, it works. However, I don't believe many people like to do it this way, that means I loose huge number of "QA" who are working hard for this compiler. What's wrong with the interface-1? The ld side is more active than the compiler side, however, in concept the IPO is driven by the compiler side. This mean this interface is changing over time. In contrast, the gold interface (as I rever-engineer from the adpator code) is more symbol-centric, taking little IPO-thing into account. That interface is simple and stable. 5. the rudimentary implementation --------------------------------- I make it works for bzip2 at cpu2kint yesterday. bzip2 is "tiny" program, I intentionally lower the partition size to get 3 partitions. There is no comment in the code, and it definitely need rewrite. I just check the correctness (with ref input), and I don't measure how much it degrade the performance. (due to the problem I have not got chance to tackle, see section 3.1.2, the symbol attribute stuff). The control flow basically is: 1. add a module pass to the IPO pass-manager, and figure out the partition scheme. 2) physically partition the merged partition. the IR and the obj of partition are placed in a new dir. "llvmipo" by default -- ls llvmipo/ Makefile merged part1.bc part1.o part2.bc part2.o part3.bc part3.o -- 3) For demo purpose, I drive the post-IPO stage via a makefile, which encapsulate hack and other nasty stuff. NOTE that the post-ipo pass in my hack contains only CodeGen pass, we need to reorganize the PassManagerBuilder::populateLTOPassManager(), which intermingle IPO pass along with intra-proc scalar pass, we need to separate them and the intra-proc scalar pass to post-IPO stage. 1 .PHONY = all 2 3 4 BC = part1.bc part2.bc part3.bc 5 OBJ = ${BC:.bc=.o} 6 7 all : merged 8 %.o : %.bc 9 $(HOME)/tmp/lto.llc -filetype=obj $+ -o $@ 10 11 merged : $(OBJ) 12 /usr/bin/ld $+ -r -o $@ 13 4. as the Makefile sugguest, the *.o of the partions are linked into a single obj "merged" and feed back to link. 6) Miscellaneous =========== Will partitioning degrade performance in theory. I think it depends on the definition of performance. If performance means execution-time, I guess it dose not. However, if performance includes code-size, I think it may have some negative impact. Following is few scenario: - constants generated by the post-IPO passes are not shared across partitions - dead func may be detected during the post-IPO stage, and they may not be deleted. -------------- next part -------------- Index: tools/lto/LTOCodeGenerator.cpp =================================================================== --- tools/lto/LTOCodeGenerator.cpp (revision 186109) +++ tools/lto/LTOCodeGenerator.cpp (working copy) @@ -17,8 +17,10 @@ #include "llvm/ADT/StringExtras.h" #include "llvm/Analysis/Passes.h" #include "llvm/Analysis/Verifier.h" +#include "llvm/Analysis/CallGraph.h" #include "llvm/Bitcode/ReaderWriter.h" #include "llvm/Config/config.h" +#include "llvm/ADT/SetVector.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/DerivedTypes.h" @@ -29,16 +31,19 @@ #include "llvm/MC/MCContext.h" #include "llvm/MC/SubtargetFeature.h" #include "llvm/PassManager.h" +#include "llvm/IRReader/IRReader.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/FileSystem.h" #include "llvm/Support/FormattedStream.h" #include "llvm/Support/Host.h" #include "llvm/Support/MemoryBuffer.h" +#include "llvm/Support/Program.h" #include "llvm/Support/Signals.h" #include "llvm/Support/TargetRegistry.h" #include "llvm/Support/TargetSelect.h" #include "llvm/Support/ToolOutputFile.h" #include "llvm/Support/system_error.h" +#include "llvm/Support/SourceMgr.h" #include "llvm/Target/Mangler.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" @@ -46,8 +51,11 @@ #include "llvm/Transforms/IPO.h" #include "llvm/Transforms/IPO/PassManagerBuilder.h" #include "llvm/Transforms/ObjCARC.h" +#include "llvm/Transforms/Utils/ValueMapper.h" +#include "llvm/Transforms/Utils/Cloning.h" using namespace llvm; + static cl::opt DisableOpt("disable-opt", cl::init(false), cl::desc("Do not run any optimization passes")); @@ -68,12 +76,154 @@ #endif } +class ModPartScheme { +public: + typedef SetVector PartitionTy; + typedef PartitionTy::iterator iterator; + typedef PartitionTy::const_iterator const_iterator; + + ModPartScheme() {} + ModPartScheme(const ModPartScheme &That) : Partition(That.Partition) {} + + void AddFunction(Function *F) { Partition.insert(F); } + + iterator begin() { return Partition.begin(); } + iterator end() { return Partition.end(); } + const_iterator begin() const { return Partition.begin(); } + const_iterator end() const { return Partition.end(); } + int size() { return Partition.size(); } + bool count(Function *F) const { return Partition.count(F); } + +private: + PartitionTy Partition; +}; + +class ModPartSchemeMgr { +public: + typedef std::vector MPSchemeTy; + typedef MPSchemeTy::iterator iterator; + typedef MPSchemeTy::const_iterator const_iterator; + + ~ModPartSchemeMgr(); + + ModPartScheme *CreateEmptyPartition(void) { + ModPartScheme *P = new ModPartScheme; + PartSchemes.push_back(P); + return P; + } + iterator begin() { return PartSchemes.begin(); } + iterator end() { return PartSchemes.end(); } + const_iterator begin() const { return PartSchemes.begin(); } + const_iterator end() const { return PartSchemes.end(); } + int empty() const { return PartSchemes.empty(); } + +private: + MPSchemeTy PartSchemes; +}; + +class ModPartAnalysis : public ModulePass { +public: + static char ID; + ModPartAnalysis(ModPartSchemeMgr &MPSM): + ModulePass(ID), ModPartMgr(MPSM), CG(0) {} + + virtual bool runOnModule(Module &M); + virtual void getAnalysisUsage(AnalysisUsage &AU) const; + +private: + // Partition threshold, currently the metric for "size" is the number + // of functions in a partition. + enum { + MaxFuncInPart = 3 + }; + + class SizeMetric { + public: + SizeMetric(int func_num=0) : FuncNum(func_num) {}; + bool ExceedThreshold() const { return FuncNum > MaxFuncInPart; } + bool ExceedThresholdTooMuch() const + { return FuncNum >= MaxFuncInPart * 3 / 2; } + void IncFuncNum(int amt = 1) { FuncNum += amt; }; + const SizeMetric& operator+=(const SizeMetric &That) + { FuncNum += That.FuncNum; return *this; } + void Reset() { FuncNum = 0; } + + private: + int FuncNum; + }; + + void setVisited(CallGraphNode *N) { Visited[N] = true; } + bool isVisited(CallGraphNode *N) const { + return Visited.find(N) != Visited.end(); + } + + SizeMetric PerformPartitionHelper(CallGraphNode *Root); + void EmitPartition(CallGraphNode *DFSRoot, SizeMetric &SM); + SizeMetric EvaluateModuleSize(const Module *M) const; + +private: + ModPartSchemeMgr &ModPartMgr; + CallGraph *CG; + std::vector DFSStack; + SizeMetric RemainingModSize; + DenseMap Visited; +}; + +char ModPartAnalysis::ID = 0; + +class ModPartXform { +public: + ModPartXform(Module *Mod, ModPartSchemeMgr &MPSM, IPOPartMgr &PM) : + PartSchemeMgr(MPSM), IPOPartMgr(PM), MergedModule(Mod), NextPartId(0) {} + + void getWorkDir(); + + void PerformTransform(); + +private: + IPOPartition *PerformTransform(ModPartScheme &PartScheme); + + void CollectGlobalSymbol(ModPartScheme &Part, Module *New, + ValueToValueMapTy &VMap); + void CollectGlobalSymbol(Function *F, Module *New, + ValueToValueMapTy &VMap); + + Function *CreateFuncDecl(const Function *F, Module *NewMod); + GlobalVariable *CreateVarDecl(const GlobalVariable *GV, Module *NewMod); + +private: + ModPartSchemeMgr &PartSchemeMgr; + IPOPartMgr &IPOPartMgr; + Module *MergedModule; + int NextPartId; +}; + +class PostIPOCompile { +public: + PostIPOCompile(IPOPartMgr &IPM, IPOFileMgr &IFM, bool ToMergeObjs = false) : + PartMgr(IPM), FileMgr(IFM), MergedObjFile(0), MergeObjs(ToMergeObjs) {} + + IPOFile *getMergedObjFile() const { return MergedObjFile; } + + bool Compile(std::string &ErrMsg); + +private: + bool generateMakefile(std::string &ErrMsg); + +private: + IPOPartMgr &PartMgr; + IPOFileMgr &FileMgr; + IPOFile *MergedObjFile; + bool MergeObjs; +}; + LTOCodeGenerator::LTOCodeGenerator() : _context(getGlobalContext()), _linker(new Module("ld-temp.o", _context)), _target(NULL), _emitDwarfDebugInfo(false), _scopeRestrictionsDone(false), _codeModel(LTO_CODEGEN_PIC_MODEL_DYNAMIC), - _nativeObjectFile(NULL) { + _nativeObjectFile(NULL), + _IPOPartMgr(_IPOFileMgr) { InitializeAllTargets(); InitializeAllTargetMCs(); InitializeAllAsmPrinters(); @@ -161,34 +311,42 @@ } bool LTOCodeGenerator::compile_to_file(const char** name, std::string& errMsg) { - // make unique temp .o file to put generated object file - SmallString<128> Filename; - int FD; - error_code EC = sys::fs::createTemporaryFile("lto-llvm", "o", FD, Filename); - if (EC) { - errMsg = EC.message(); + if (determineTarget(errMsg)) return true; - } - // generate object file - tool_output_file objFile(Filename.c_str(), FD); + PostIPOCompile PostIPOStage(_IPOPartMgr, _IPOFileMgr, true/*merge objects*/); + if (!_IPOFileMgr.CreateWorkDir(errMsg)) + return true; - bool genResult = generateObjectFile(objFile.os(), errMsg); - objFile.os().close(); - if (objFile.os().has_error()) { - objFile.os().clear_error(); - sys::fs::remove(Twine(Filename)); + performIPO(errMsg, true); + + if (!PostIPOStage.Compile(errMsg)) return true; - } - objFile.keep(); - if (genResult) { - sys::fs::remove(Twine(Filename)); + *name = PostIPOStage.getMergedObjFile()->getPath().c_str(); + return false; +} + +bool LTOCodeGenerator::compile_to_files(const char** name, std::string& errMsg) { + if (determineTarget(errMsg)) return true; + + performIPO(errMsg); + + // Parallelize post-IPO + _nativeObjectPath.clear(); + PostIPOCompile PostIPOStage(_IPOPartMgr, _IPOFileMgr); + if (!PostIPOStage.Compile(errMsg)) + return true; + + for (IPOPartMgr::iterator I = _IPOPartMgr.begin(), E = _IPOPartMgr.end(); + I != E; I++) { + _nativeObjectPath.append((*I)->getObjFilePath().data()); + _nativeObjectPath.append('\0'); } + _nativeObjectPath.append('\0'); + *name = _nativeObjectPath.c_str(); - _nativeObjectPath = Filename.c_str(); - *name = _nativeObjectPath.c_str(); return false; } @@ -357,16 +515,12 @@ _scopeRestrictionsDone = true; } -/// Optimize merged modules using various IPO passes -bool LTOCodeGenerator::generateObjectFile(raw_ostream &out, - std::string &errMsg) { - if (this->determineTarget(errMsg)) - return true; +void LTOCodeGenerator::performIPO(std::string &errMsg, bool PerformPartition) { Module* mergedModule = _linker.getModule(); // Mark which symbols can not be internalized - this->applyScopeRestrictions(); + applyScopeRestrictions(); // Instantiate the pass manager to organize the passes. PassManager passes; @@ -390,13 +544,30 @@ // Make sure everything is still good. passes.add(createVerifierPass()); + ModPartSchemeMgr MPSM; + if (PerformPartition) + passes.add(new ModPartAnalysis(MPSM)); + + passes.run(*mergedModule); + + if (!MPSM.empty()) { + ModPartXform MPT(mergedModule, MPSM, _IPOPartMgr); + MPT.PerformTransform(); + } else { + IPOPartition *P = _IPOPartMgr.CreateIPOPart(mergedModule); + P->SaveBitCode(); + } +} + +bool LTOCodeGenerator::performPostLTO(Module *Mod, formatted_raw_ostream &Out, + std::string &errMsg) { + // placeholder for post-IPO scalar opt + PassManager codeGenPasses; codeGenPasses.add(new DataLayout(*_target->getDataLayout())); _target->addAnalysisPasses(codeGenPasses); - formatted_raw_ostream Out(out); - // If the bitcode files contain ARC code and were compiled with optimization, // the ObjCARCContractPass must be run, so do it unconditionally here. codeGenPasses.add(createObjCARCContractPass()); @@ -404,16 +575,31 @@ if (_target->addPassesToEmitFile(codeGenPasses, Out, TargetMachine::CGFT_ObjectFile)) { errMsg = "target file type not supported"; + return true;; + } + + // Run the code generator, and write assembly file + codeGenPasses.run(*Mod); + return false; +} + +/// Optimize merged modules using various IPO passes +bool LTOCodeGenerator::generateObjectFile(Module *Mod, const char *FN, + std::string &errMsg) { + std::string errFile; + tool_output_file OutFile(FN, errMsg, raw_fd_ostream::F_Binary); + + if (!errFile.empty()) { + errMsg += errFile; return true; } + OutFile.keep(); - // Run our queue of passes all at once now, efficiently. - passes.run(*mergedModule); + formatted_raw_ostream OS(OutFile.os()); + bool Fail = performPostLTO(Mod, OS, errMsg); + OutFile.os().close(); - // Run the code generator, and write assembly file - codeGenPasses.run(*mergedModule); - - return false; // success + return Fail; } /// setCodeGenDebugOptions - Set codegen debugging options to aid in debugging @@ -428,3 +614,495 @@ _codegenOptions.push_back(strdup(o.first.str().c_str())); } } + +/////////////////////////////////////////////////////////////////////////// +// +// Implementation of ModPartSchemeMgr, ModPartXform +// +/////////////////////////////////////////////////////////////////////////// +// +ModPartSchemeMgr::~ModPartSchemeMgr() { + while (!PartSchemes.empty()) { + delete PartSchemes.back(); + PartSchemes.pop_back(); + } +} + +Function *ModPartXform::CreateFuncDecl(const Function *F, Module *NewModule) { + Function *NF = Function::Create(F->getFunctionType(), + GlobalValue::ExternalLinkage, + F->getName(), NewModule); + NF->copyAttributesFrom(F); + return NF; +} + +static void PromoteGlobalVarLinkage(GlobalVariable *GV) { + GV->setLinkage(GlobalValue::ExternalLinkage); +} + +static void PromoteGlobalFuncLinkage(Function *F) { + F->setLinkage(GlobalValue::ExternalLinkage); +} + +GlobalVariable *ModPartXform::CreateVarDecl(const GlobalVariable *GV, + Module *NewMod) { + GlobalVariable *G; + G = new GlobalVariable(*NewMod, GV->getType()->getElementType(), + GV->isConstant(), + GlobalValue::ExternalLinkage, + 0 /* InitVal */, + Twine(GV->getName()), + 0 /* Before */, + GV->getThreadLocalMode(), + GV->getType()->getPointerAddressSpace(), + GV->hasInitializer() ? true : false); + return G; +} + +void ModPartXform::CollectGlobalSymbol(Function *F, + Module *New, + ValueToValueMapTy &VMap) { + DenseMap Visited; + SmallVector WorkList; + + for (Function::iterator BI = F->begin(), BE = F->end(); BI != BE; BI++) { + for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); + II != IE; II++) { + Instruction &Inst = *II; + for (User::op_iterator op = Inst.op_begin(), E = Inst.op_end(); + op != E; ++op) { + if (Constant *C = dyn_cast(*op)) { + if (!isa(C) && Visited.find(C) == Visited.end()) { + Visited[C] = true; + WorkList.push_back(C); + } + } + } + } + + while(!WorkList.empty()) { + Constant *C = WorkList.pop_back_val(); + if (GlobalVariable *GV = dyn_cast(C)) { + if (VMap.find(GV) == VMap.end()) { + VMap[GV] = CreateVarDecl(GV, New); + PromoteGlobalVarLinkage(GV); + } + continue; + } else if (Function *Func = dyn_cast(C)) { + if (VMap.find(Func) == VMap.end()) { + VMap[Func] = CreateFuncDecl(Func, New); + PromoteGlobalFuncLinkage(Func); + } + continue; + } + + for (User::const_op_iterator I = C->op_begin(), E = C->op_end(); + I != E; ++I) { + Constant *C2 = dyn_cast(*I); + if (C2 && Visited.find(C2) == Visited.end()) { + Visited[C2] = true; + WorkList.push_back(C2); + } + } + } + } +} + +void ModPartXform::CollectGlobalSymbol(ModPartScheme &Part, Module *New, + ValueToValueMapTy &VMap) { + for (ModPartScheme::iterator I = Part.begin(), E = Part.end(); + I != E; I++) { + const Function *F = *I; + VMap[F] = CreateFuncDecl(F, New); + } + + for (ModPartScheme::iterator I = Part.begin(), E = Part.end(); + I != E; I++) + CollectGlobalSymbol(*I, New, VMap); +} + +void __attribute__((used)) dump_module(Module *M) { + std::string EI; + tool_output_file f("module.ll", EI); + f.keep(); + + M->print(f.os(), 0); +} + +void __attribute__((used)) dump_type(Type *T) { + T->dump(); +} + +// Splitting the merged module by moving the specified functions to the +// new module +IPOPartition *ModPartXform::PerformTransform(ModPartScheme &PartScheme) { + std::string FN; + raw_string_ostream OS(FN); + OS << "partition" << NextPartId++; + + Module *NewMod = new Module(OS.str(), MergedModule->getContext()); + NewMod->setDataLayout(MergedModule->getDataLayout()); + NewMod->setTargetTriple(MergedModule->getTargetTriple()); + + ValueToValueMapTy VMap; + CollectGlobalSymbol(PartScheme, NewMod, VMap); + + // Copy over functions in the partition + for (ModPartScheme::iterator I = PartScheme.begin(), E = PartScheme.end(); + I != E; I++) { + Function *OF = *I; + Function *NF = cast(VMap[OF]); + + // Steal some code from llvm::CloneFunction. + { + // Loop over the arguments, copying the names of the mapped arguments over... + Function::arg_iterator DestI = NF->arg_begin(); + for (Function::const_arg_iterator I = OF->arg_begin(), E = OF->arg_end(); + I != E; ++I) + // Is this argument preserved? WTF, how come an argument is preserved? + if (VMap.count(I) == 0) { + DestI->setName(I->getName()); // Copy the name over... + VMap[I] = DestI++; // Add mapping to VMap + } + } + SmallVector Returns; + CloneFunctionInto(NF, OF, VMap, true, Returns); + + OF->deleteBody(); + } + + IPOPartition *NewPart = IPOPartMgr.CreateIPOPart(NewMod); + + // We have to save the module to disk such that next time it's loaded + // it relong to different context. + NewPart->SaveBitCode(); + + return NewPart; +} + +void ModPartXform::PerformTransform() { + for (ModPartSchemeMgr::iterator I = PartSchemeMgr.begin(), + E = PartSchemeMgr.end(); I != E; I++) + (void)PerformTransform(**I); + + IPOPartition *MP = IPOPartMgr.CreateIPOPart(MergedModule); + MP->SaveBitCode(); +} + +/////////////////////////////////////////////////////////////////////////// +// +// Implementation of ModPartAnalysis +// +/////////////////////////////////////////////////////////////////////////// +// +void ModPartAnalysis::getAnalysisUsage(AnalysisUsage &AU) const { + AU.setPreservesAll(); + AU.addRequired(); +} + +void ModPartAnalysis::EmitPartition(CallGraphNode *DFSRoot, SizeMetric &SM) { + ModPartScheme *P = ModPartMgr.CreateEmptyPartition(); + while (!DFSStack.empty()) { + CallGraphNode *N = DFSStack.back(); + P->AddFunction(N->getFunction()); + DFSStack.pop_back(); + if (N == DFSRoot) + break; + } +} + +ModPartAnalysis::SizeMetric +ModPartAnalysis::PerformPartitionHelper(CallGraphNode *R) { + SizeMetric SM; + setVisited(R); + + // Skip dummy call-graph node or declaration + { + Function *F = R->getFunction(); + if (!F || F->isDeclaration()) + return SM; + } + + DFSStack.push_back(R); + SM.IncFuncNum(); + + for (CallGraphNode::iterator I = R->begin(), E = R->end(); I != E; I++) { + CallGraphNode *Callee = (*I).second; + if (isVisited(Callee)) + continue; + + setVisited(Callee); + + // Skip dummy call-graph node or declaration + Function *F = R->getFunction(); + if (!F || F->isDeclaration()) + continue; + + SizeMetric T = PerformPartitionHelper(Callee); + bool Emit = false; + + if (T.ExceedThreshold()) + Emit = true; + else { + SM += T; + Emit = SM.ExceedThreshold(); + } + + if (Emit) { + EmitPartition(R, SM); + SM.Reset(); + if (!RemainingModSize.ExceedThresholdTooMuch()) + break; + } + } + return SM; +} + +// Return the "size" of given module. +ModPartAnalysis::SizeMetric ModPartAnalysis::EvaluateModuleSize + (const Module *M) const { + SizeMetric S; + for (Module::const_iterator I = M->begin(), E = M->end(); I != E; I++) { + const Function &F = *I; + if (!F.isDeclaration()) + S.IncFuncNum(); + } + return S; +} + +bool ModPartAnalysis::runOnModule(Module &M) { + SizeMetric S = EvaluateModuleSize(&M); + if (!S.ExceedThresholdTooMuch()) { + // While it may be big, it is okay. + return false; + } + + if (!(CG = getAnalysisIfAvailable())) + return false; + + CallGraphNode *R = CG->getRoot(); + (void)PerformPartitionHelper(R); + + return false; +} + +// ///////////////////////////////////////////////////////////////////////////// +// +// Implementation of IPOPartition and IPOPartMgr +// +// ///////////////////////////////////////////////////////////////////////////// +// +IPOPartition::IPOPartition(Module *M, const char *NameWoExt, IPOFileMgr &FM) : + Mod(0), Ctx(0), IRFile(0), ObjFile(0), FileNameWoExt(NameWoExt), FileMgr(FM) { +} + +IPOFile &IPOPartition::getIRFile() const { + if (IRFile) + return *IRFile; + else { + std::string FN(FileNameWoExt + ".bc"); + return *(IRFile = FileMgr.CreateIRFile(FN.c_str())); + } +} + +IPOFile &IPOPartition::getObjFile() const { + if (ObjFile) + return *ObjFile; + else { + std::string FN(FileNameWoExt + ".o"); + return *(ObjFile = FileMgr.CreateObjFile(FN.c_str())); + } +} + + +bool IPOPartition::SaveBitCode() { + if (!Mod) { + // the bit-code have already saved in disk + return true; + } + + IPOFile &F = getIRFile(); + if (F.ErrOccur()) + return false; + + raw_fd_ostream OF(F.getPath().c_str(), F.getLastErrStr(), + raw_fd_ostream::F_Binary); + WriteBitcodeToFile(Mod, OF); + OF.close(); + + Mod = 0; + delete Ctx; + Ctx = 0; + + return !F.ErrOccur(); +} + +bool IPOPartition::LoadBitCode() { + if (Mod) + return true; + + IPOFile &F = getIRFile(); + if (F.ErrOccur()) + return false; + + Ctx = new LLVMContext; + SMDiagnostic Diag; + Mod = ParseIRFile(getIRFilePath(), Diag, *Ctx); + if (!Mod) { + F.getLastErrStr() = Diag.getMessage(); + return false; + } + + return true; +} + +IPOPartition *IPOPartMgr::CreateIPOPart(Module *M) { + std::string PartName; + raw_string_ostream OS(PartName); + OS << "part" << NextPartId++; + + IPOPartition *P = new IPOPartition(M, OS.str().c_str(), FileMgr); + P->Mod = M; + IPOParts.push_back(P); + return P; +} + +// ///////////////////////////////////////////////////////////////////////////// +// +// Implementation of IPOFile and IPOFileMgr +// +// ///////////////////////////////////////////////////////////////////////////// +IPOFile::IPOFile(const char *DirName, const char* BaseName, bool KeepFile) + : Fname(BaseName), Fpath(DirName), Keep(KeepFile) { + Fpath = Fpath + "/" + BaseName; + Keep = true; +} + +IPOFile::~IPOFile() { + if (Keep) + sys::fs::remove(Fpath); +} + +IPOFileMgr::IPOFileMgr(): WorkDir("llvmipo") { + IRFiles.reserve(20); + ObjFiles.reserve(20); + OtherFiles.reserve(8); + KeepFiles = true; + WorkDirCreated = false; +} + +IPOFileMgr::~IPOFileMgr() { + if (!KeepFiles) { + uint32_t NumRm; + sys::fs::remove_all(Twine(WorkDir), NumRm); + } +} + +bool IPOFileMgr::CreateWorkDir(std::string &ErrorInfo) { + if (WorkDirCreated) + return true; + + bool Exist; + error_code EC = sys::fs::create_directory(Twine(WorkDir), Exist); + if (EC == error_code::success()) { + WorkDirCreated = true; + return true; + } + + return false; +} + +IPOFile *IPOFileMgr::CreateIRFile(const char *Name) { + IPOFile *F = CreateFile(Name); + IRFiles.push_back(F); + return F; +} + +IPOFile *IPOFileMgr::CreateObjFile(const char *Name) { + IPOFile *F = CreateFile(Name); + ObjFiles.push_back(F); + return F; +} + +IPOFile *IPOFileMgr::CreateMakefile(const char *Name) { + IPOFile *F = CreateFile(Name); + OtherFiles.push_back(F); + return F; +} + +// ///////////////////////////////////////////////////////////////////////////// +// +// Implementation of PostIPOCompile +// +// ///////////////////////////////////////////////////////////////////////////// + +// The makefile looks something like this: +// +// .PHONY = all +// +// BC = part1.bc part2.bc part3.bc +// OBJ = ${BC:.bc=.o} +// +// all : merged.o +// %.o : %.bc +// $(HOME)/tmp/lto.llc -filetype=obj $< -o $@ +// +// merged.o : $(OBJ) +// /usr/bin/ld $+ -r -o $@ +// +bool PostIPOCompile::generateMakefile(std::string &ErrMsg) { + + IPOFile *MkFile = FileMgr.CreateMakefile("Makefile"); + + std::string NewErrMsg; + raw_fd_ostream Mk(MkFile->getPath().c_str(), NewErrMsg, 0); + + if (!NewErrMsg.empty()) { + ErrMsg += NewErrMsg; + return false; + } + + std::string BCFiles; + for (IPOPartMgr::iterator I = PartMgr.begin(), E = PartMgr.end(); + I != E; I++) { + BCFiles += (*I)->getIRFile().getName(); + BCFiles += " "; + } + + Mk << ".PHONY = all\n\n"; + + Mk << "\nBC = " << BCFiles << "\n"; + Mk << "OBJ = ${BC:.bc=.o}\n\n"; + + if (MergeObjs) + Mk << "all : " << MergedObjFile->getName() << "\n"; + else + Mk << "all : $(OBJ)\n"; + + // Emit rule + Mk << "%.o : %.bc\n\t$(HOME)/tmp/lto.llc -filetype=obj $+ -o $@\n\n"; + + if (MergeObjs) { + Mk << MergedObjFile->getName() << " : $(OBJ)\n"; + Mk << "\t/usr/bin/ld $+ -r -o $@\n\n"; + } + + Mk.close(); + + return true; +} + +bool PostIPOCompile::Compile(std::string &ErrMsg) { + if (MergeObjs) + MergedObjFile = FileMgr.CreateObjFile("merged"); + + if (!generateMakefile(ErrMsg)) + return false; + + const char *args[] = {"/usr/bin/make", "-C", 0, 0}; + args[2] = FileMgr.getWorkDir().c_str(); + + bool Fail; + sys::ExecuteAndWait("/usr/bin/make", args, 0/*envp*/, 0/*redirect*/, 0/*wait*/, 0, &ErrMsg, &Fail); + return !Fail; +} Index: tools/lto/LTOCodeGenerator.h =================================================================== --- tools/lto/LTOCodeGenerator.h (revision 186109) +++ tools/lto/LTOCodeGenerator.h (working copy) @@ -18,6 +18,8 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/StringMap.h" #include "llvm/Linker.h" +#include "llvm/Support/FormattedStream.h" +#include "llvm/Support/system_error.h" #include #include @@ -28,6 +30,111 @@ class MemoryBuffer; class TargetMachine; class raw_ostream; + + class IPOFile; + class IPOFileMgr; + class IPOPartition { + public: + bool isInMemory() const { return Mod != 0; } + bool SaveBitCode(); + bool LoadBitCode(); + const std::string &getIRFilePath() const; + const std::string &getObjFilePath() const; + Module *getModule() const { return Mod; } + + IPOFile &getIRFile() const; + IPOFile &getObjFile() const; + + private: + friend class IPOPartMgr; + IPOPartition(Module *M, const char *FileNameWoExt, IPOFileMgr &FM); + + Module *Mod; + LLVMContext *Ctx; + mutable IPOFile *IRFile; + mutable IPOFile *ObjFile; + std::string FileNameWoExt; + IPOFileMgr &FileMgr; + }; + + class IPOPartMgr { + public: + typedef std::vector IPOPartsTy; + typedef IPOPartsTy::iterator iterator; + typedef IPOPartsTy::const_iterator const_iterator; + + iterator begin() { return IPOParts.begin(); } + iterator end() { return IPOParts.end(); } + const_iterator begin() const { return IPOParts.begin(); } + const_iterator end() const { return IPOParts.end(); } + + IPOPartition *CreateIPOPart(Module *); + + IPOPartMgr(IPOFileMgr &IFM) : FileMgr(IFM), NextPartId(1) {} + + private: + IPOPartsTy IPOParts; + IPOFileMgr &FileMgr; + int NextPartId; + }; + + class IPOFile { + public: + const std::string &getName() { return Fname; } + const std::string &getPath() { return Fpath; } + + error_code &getLastErrCode() { return LastErr; } + std::string &getLastErrStr() { return LastErrStr; } + + bool ErrOccur() const { + return LastErr != error_code::success() || !LastErrStr.empty(); + } + + void setKeep() { Keep = true; } + bool toKeep() const { return Keep; } + + private: + friend class IPOFileMgr; + IPOFile(const char* DirName, const char *BaseName, bool Keep=false); + ~IPOFile(); + + private: + std::string Fname; + std::string Fpath; + error_code LastErr; + std::string LastErrStr; + bool Keep; + }; + + class IPOFileMgr { + public: + IPOFileMgr(); + ~IPOFileMgr(); + + bool CreateWorkDir(std::string &ErrorInfo); + const std::string &getWorkDir() { return WorkDir; } + + IPOFile *CreateIRFile(const char *Name); + IPOFile *CreateObjFile(const char *Name); + IPOFile *CreateMakefile(const char *Name); + + typedef std::vector FileVect; + FileVect &getIRFiles() { return IRFiles; } + FileVect &getObjFiles() { return ObjFiles; } + + private: + IPOFile *CreateFile(const char *Name) { + return new IPOFile(WorkDir.c_str(), Name); + } + + private: + FileVect IRFiles; + FileVect ObjFiles; + FileVect OtherFiles; + std::string WorkDir; + bool KeepFiles; + bool WorkDirCreated; + }; } //===----------------------------------------------------------------------===// @@ -52,11 +159,16 @@ bool writeMergedModules(const char *path, std::string &errMsg); bool compile_to_file(const char **name, std::string &errMsg); + bool compile_to_files(const char** name, std::string& errMsg); const void *compile(size_t *length, std::string &errMsg); void setCodeGenDebugOptions(const char *opts); private: - bool generateObjectFile(llvm::raw_ostream &out, std::string &errMsg); + void performIPO(std::string &errMsg, bool PerformPartition=false); + bool performPostLTO(llvm::Module *Mod, llvm::formatted_raw_ostream &Out, + std::string &errMsg); + + bool generateObjectFile(llvm::Module *, const char *Out, std::string &errMsg); void applyScopeRestrictions(); void applyRestriction(llvm::GlobalValue &GV, std::vector &mustPreserveList, @@ -78,6 +190,8 @@ std::vector _codegenOptions; std::string _mCpu; std::string _nativeObjectPath; + llvm::IPOPartMgr _IPOPartMgr; + llvm::IPOFileMgr _IPOFileMgr; }; #endif // LTO_CODE_GENERATOR_H From ofv at wanadoo.es Fri Jul 12 17:01:10 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Sat, 13 Jul 2013 02:01:10 +0200 Subject: [LLVMdev] setjmp/longjmp exception handling: how? References: <06A5FD1E-76EB-47C1-B24A-296F35EE3FFD@dpmms.cam.ac.uk> Message-ID: <878v1b5nw9.fsf@wanadoo.es> Jim Grosbach writes: > I strongly advise you not to use SjLj exception handling. Use > zero-cost-exceptions via DWARF instead. On Windows, DWARF EH doesn't work when there are stack frames not compiled with DWARF EH support (i.e. OS code that invokes user-provided callbacks, a common practice on Windows.) From peter at pcc.me.uk Fri Jul 12 17:38:36 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Fri, 12 Jul 2013 17:38:36 -0700 Subject: [LLVMdev] Special case list files; a bug and a slowness issue Message-ID: <20130713003835.GA15193@pcc.me.uk> Hi, I need to be able to use a special case list file containing thousands of entries (namely, a list of libc symbols, to be used when using DFSan with an uninstrumented libc). Initially I built the symbol list like this: fun:sym1=uninstrumented fun:sym2=uninstrumented fun:sym3=uninstrumented ... fun:sym6000=uninstrumented What I found was that, despite various bits of documentation [1,2], the symbol names are matched as substrings, the root cause being that the regular expressions built by the SpecialCaseList class do not contain anchors. The attached unit test demonstrates the problem. If I modify my symbol list to contain anchors: fun:^sym1$=uninstrumented fun:^sym2$=uninstrumented fun:^sym3$=uninstrumented ... fun:^sym6000$=uninstrumented the behaviour is as expected, but compiler run time is slow (on the order of seconds), presumably because our regex library doesn't cope with anchors very efficiently. I intend to resolve the substring bug and the slow run time issue by using a StringSet for symbol patterns which do not contain regex metacharacters. There would still be a regex for any other patterns, which would have anchors added automatically. Thoughts? Thanks, -- Peter [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer [2] https://code.google.com/p/thread-sanitizer/wiki/Flags -------------- next part -------------- A non-text attachment was scrubbed... Name: scl.patch Type: text/x-diff Size: 842 bytes Desc: not available URL: From ripzonetriton at gmail.com Fri Jul 12 20:04:58 2013 From: ripzonetriton at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Matos?=) Date: Sat, 13 Jul 2013 04:04:58 +0100 Subject: [LLVMdev] setjmp/longjmp exception handling: how? In-Reply-To: <878v1b5nw9.fsf@wanadoo.es> References: <06A5FD1E-76EB-47C1-B24A-296F35EE3FFD@dpmms.cam.ac.uk> <878v1b5nw9.fsf@wanadoo.es> Message-ID: On Sat, Jul 13, 2013 at 1:01 AM, Óscar Fuentes wrote: > Jim Grosbach writes: > > > I strongly advise you not to use SjLj exception handling. Use > > zero-cost-exceptions via DWARF instead. > > On Windows, DWARF EH doesn't work when there are stack frames not > compiled with DWARF EH support (i.e. OS code that invokes user-provided > callbacks, a common practice on Windows.) You can also use the patches available on the list with support for Win64 EH and you will not have that problem, though 32-bits is still a problem. -- João Matos -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda-qa at disemia.com Fri Jul 12 21:02:33 2013 From: eda-qa at disemia.com (edA-qa mort-ora-y) Date: Sat, 13 Jul 2013 06:02:33 +0200 Subject: [LLVMdev] Inlined call properly optimized, but not function itself Message-ID: <51E0D159.1090402@disemia.com> I saw something strange in the optimized LLVM code. It appears that sometimes an inlined function is optimized though the function itself is not optimized to the same level. Below is an example of an unoptimized/non-inlined function call: define void @_entry() uwtable { entry: %0 = call i64 @eval_expr() call void @trace_integer(i64 %0) ret void } 'eval_expr' is a big ugly series of branches, so I'm very happy that LLVM figures it out and manages to optimize away the call entirely: define void @_entry() uwtable { entry: tail call void @trace_integer(i64 3) ret void } Obviously 'eval_expr' can be optimized to a constant value '3'. However, the 'eval_expr' function doesn't actually reflect this optimization: it still has a branch. Granted, it's a lot smaller than the input, but I'd expect it to end up as just 'ret i64 3'. define i64 @eval_expr() uwtable { entry: %0 = extractvalue %0 { i1 true, i64 3 }, 0 br i1 %0, label %switch_then_24, label %switch_end_2 switch_end_2: ; preds = %entry, %switch_then_24 %1 = phi i64 [ %2, %switch_then_24 ], [ 4, %entry ] ret i64 %1 switch_then_24: ; preds = %entry %2 = extractvalue %0 { i1 true, i64 3 }, 1 br label %switch_end_2 } In other situations the full reduction is done. It appears as I add more branches the likelihood of it not happening increases. Why is this happening and how can I fix it? -- edA-qa mort-ora-y -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Sign: Please digitally sign your emails. Encrypt: I'm also happy to receive encrypted mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From artagnon at gmail.com Sat Jul 13 02:53:05 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Sat, 13 Jul 2013 15:23:05 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <13482539.3EiqhZXsap@aragorn.auenland.lan> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <383133607810849727@unknownmsgid> <13482539.3EiqhZXsap@aragorn.auenland.lan> Message-ID: [Please be friendly towards non-subscribers and always reply-to-all; I found your message in the list archives] Joerg Sonnenberger wrote: > > Eli Friedman wrote: > > > The reason it's the right thing to do is that the mem/imm forms of > > > btsw and btsl have exactly the same semantics. > > > > Not sure I understand this. > > There is no way to tell from the arguments whether bts should be btsw or > btsl. That's the ambiguity it is complaining about. Fixing it is > trivial. There are other cases with similar issues in the 8087 syntax > with size of the floating point operand. I see. Can you explain the results of this experiment to me? Input #1: btrw $1, 0 btr $1, 0 btsw $1, 0 bts $1, 0 Output #1 from gas: 1 0000 660FBA34 btrw $1, 0 1 25000000 1 0001 2 000a 0FBA3425 btr $1, 0 2 00000000 2 01 3 0013 660FBA2C btsw $1, 0 3 25000000 3 0001 4 001d 0FBA2C25 bts $1, 0 4 00000000 4 01 Input #2: btrl $1, 0 btr $1, 0 btsl $1, 0 bts $1, 0 Output #2 from gas: 1 0000 0FBA3425 btrl $1, 0 1 00000000 1 01 2 0009 0FBA3425 btr $1, 0 2 00000000 2 01 3 0012 0FBA2C25 btsl $1, 0 3 00000000 3 01 4 001b 0FBA2C25 bts $1, 0 4 00000000 4 01 5 bts{w} is defined as: ins i16mem:$src1, i16i8imm:$src2 while bts{l} is defined as: ins i32mem:$src1, i32i8imm:$src2 My disambiguation of bts is defined as: def : InstAlias<"btr $imm, $mem", (BTR32mi8 i32mem:$mem, i32i8imm:$imm)>; ie. I treat the first operand is an i32mem, and the second is as i32i8imm, to disambiguate bts to btsl; exactly like bt disambiguates to btl in the previous line (from 824a907): def : InstAlias<"bt $imm, $mem", (BT32mi8 i32mem:$mem, i32i8imm:$imm)>; What am I missing? > On Thu, Jul 11, 2013 at 10:59:32AM +0530, Ramkumar Ramachandra wrote: > > For the record, I don't think matching linux.git/gas is a problem: > > they're very authoritative pieces of software that have been around > > for a _really_ long time. > > That's a very, very weak argument. There are a lot of things Clang > rejects as errors by default that has been used in old code bases, > because GCC accepted it. That was a personal opinion, and I've consistently demonstrated that I have no experience/ in-depth knowledge of assemblers. I'm not sure what to do myself, but this disambiguation seems to be the reasonable (not claiming Correct or Incorrect) solution. My agenda is simple: I don't like the outdated cruft that GNU as is, and I think a lot of codebases can benefit from the transition to LLVM's assembler. That's not going to happen by sitting around doing nothing. From ismail at donmez.ws Sat Jul 13 08:46:04 2013 From: ismail at donmez.ws (=?UTF-8?B?xLBzbWFpbCBEw7ZubWV6?=) Date: Sat, 13 Jul 2013 17:46:04 +0200 Subject: [LLVMdev] Multiple failures on PowerPC64 Message-ID: Hi, I got multiple failures on PowerPC64, crashes are the same: [ 1346s] 0 lli 0x00000000107bca00 [ 1346s] 1 lli 0x00000000107bcd14 [ 1346s] 2 lli 0x00000000107bcfb8 [ 1346s] 3 linux-vdso64.so.1 0x00000fff850f0418 __kernel_sigtramp_rt64 + 0 [ 1346s] 4 lli 0x00000000100bdc98 [ 1346s] 5 lli 0x00000000102f40d0 [ 1346s] 6 lli 0x00000000100180f4 [ 1346s] 7 libc.so.6 0x00000fff84c357ac [ 1346s] 8 libc.so.6 0x00000fff84c359d4 __libc_start_main + 4293391356 The following tests fail: LLVM :: ExecutionEngine/test-common-symbols.ll LLVM :: ExecutionEngine/test-constantexpr.ll LLVM :: ExecutionEngine/test-fp-no-external-funcs.ll LLVM :: ExecutionEngine/test-fp.ll LLVM :: ExecutionEngine/test-global-init-nonzero.ll LLVM :: ExecutionEngine/test-global.ll LLVM :: ExecutionEngine/test-interp-vec-arithm_float.ll LLVM :: ExecutionEngine/test-interp-vec-arithm_int.ll LLVM :: ExecutionEngine/test-interp-vec-loadstore.ll LLVM :: ExecutionEngine/test-interp-vec-logical.ll LLVM :: ExecutionEngine/test-interp-vec-setcond-fp.ll LLVM :: ExecutionEngine/test-interp-vec-setcond-int.ll LLVM :: ExecutionEngine/test-loadstore.ll LLVM :: ExecutionEngine/test-local.ll LLVM :: ExecutionEngine/test-logical.ll LLVM :: ExecutionEngine/test-loop.ll LLVM :: ExecutionEngine/test-phi.ll LLVM :: ExecutionEngine/test-ret.ll LLVM :: ExecutionEngine/test-return.ll LLVM :: ExecutionEngine/test-setcond-fp.ll LLVM :: ExecutionEngine/test-setcond-int.ll LLVM :: ExecutionEngine/test-shift.ll Any ideas? -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Sat Jul 13 09:03:12 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Sat, 13 Jul 2013 17:03:12 +0100 Subject: [LLVMdev] Multiple failures on PowerPC64 In-Reply-To: References: Message-ID: Hi, > LLVM :: ExecutionEngine/test-shift.ll > > Any ideas? Those tests are for the old JIT, and shouldn't be run at all on PowerPC. The host_arch field in your lit.site.cfg (somewhere in the build directory) is probably being set incorrectly. I've seen this when LLVM is cross-compiled with CMake before. I had to set CMAKE_SYSTEM_PROCESSOR in my toolchain file (in your case to "PowerPC"). But your situation could be completely different, of course. So are you using autotools or CMake? And with what options? Cheers. Tim. From s at pahtak.org Sat Jul 13 10:15:30 2013 From: s at pahtak.org (Stephen Checkoway) Date: Sat, 13 Jul 2013 13:15:30 -0400 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> Message-ID: <3295DE4F-3F83-441F-AD1B-6656015AD700@pahtak.org> Eli Friedman wrote: > The reason it's the right thing to do is that the mem/imm forms of > btsw and btsl have exactly the same semantics. The Intel documentation implies that this is the case: > If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string (see Figure 3-2). The offset operand then selects a bit position within the range −231 to 231 − 1 for a register offset and 0 to 31 for an immediate offset. However, this doesn't seem to be true if the immediate value is greater than 15. See the attached imm.c which prints the results of bts m16,imm8 (btsw) and bts m32,imm8 (btsl) with the immediate in [0,63]. For btsw, only the least significant 4 bits of the immediate seem to be used whereas for btsl, only the least significant 5 bits seem to be used. In contrast, bts m16,r16 (btsw) and bts m32,r32 (btsl) are identical for bit offset operands in [0,63] and are likely identical for [-2^{15}, 2^{15}-1], although I didn't actually check the others. See the attached reg.c which changes the immediate constraints to register constraints. btr behaves similarly. For the memory, immediate form without the suffix, it seems like the options are 1. If the immediate value is in [0,15], use btsl/btrl since it saves a byte, otherwise error; 2. Follow both gas's behavior and the Solaris assembler manual Jim Grosbach linked to which stated that unsuffixed instructions are assumed to be long and alias bts to btsl and btr to btrl; or 3. Always error even when there is no ambiguity. I have no opinion on which option LLVM should follow. -- Stephen Checkoway -------------- next part -------------- A non-text attachment was scrubbed... Name: imm.c Type: application/octet-stream Size: 576 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reg.c Type: application/octet-stream Size: 578 bytes Desc: not available URL: From dmitry at kernelgen.org Sat Jul 13 16:20:26 2013 From: dmitry at kernelgen.org (Dmitry Mikushin) Date: Sun, 14 Jul 2013 01:20:26 +0200 Subject: [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all In-Reply-To: References: <7EBA7A880CFD5440A1AA2C987DFACB2D37202BAD49@HQMAIL04.nvidia.com> Message-ID: Hi Justin, Would it be possible to implement emission of .pragma "nounroll"; from @unrollpragma = private addrspace(1) constant [17 x i8] c"#pragma unroll 1\00", align 4096 ? It is used in math modules. Thanks, - D. 2013/6/5 Justin Holewinski > I would be glad to hear of any issues you have encountered on this path. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sanjoy at playingwithpointers.com Sat Jul 13 17:18:27 2013 From: sanjoy at playingwithpointers.com (Sanjoy Das) Date: Sun, 14 Jul 2013 05:48:27 +0530 Subject: [LLVMdev] [PATCH v2] MC: handle .cfi_startproc simple In-Reply-To: <1373718636-9686-1-git-send-email-artagnon@gmail.com> References: <1373718636-9686-1-git-send-email-artagnon@gmail.com> Message-ID: Hi, After your change, EmitCFIStartProcImpl in MCAsmStreamer does not match the signature of the EmitCFIStartProcImpl in MCStreamer and you end up not overriding the original function. One of the places where "virtual void EmitCFIStartProcImpl(MCDwarfFrameInfo &Frame) = 0" would have helped. :) Specifically, adding virtual void EmitCFIStartProcImpl(MCDwarfFrameInfo &Frame) { EmitCFIStartProcImpl(Frame, false); } to MCAsmStreamer gets llvm to pass all tests. I'm not familiar with this part of llvm, but my guess about the right thing to do is to change EmitCFIStartProcImpl in MCStreamer to take the Simple flag; and have _all_ subclasses of MCStreamer respect that. -- Sanjoy Das http://playingwithpointers.com From sebpop at gmail.com Sat Jul 13 22:21:44 2013 From: sebpop at gmail.com (Sebastian Pop) Date: Sun, 14 Jul 2013 00:21:44 -0500 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> Message-ID: Hi, I think this should also go on the llvm-dev mailing list... On Sat, Jul 13, 2013 at 11:18 PM, Star Tan wrote: > > At 2013-07-14 02:30:07,"Tobias Grosser" wrote: >>On 07/13/2013 10:13 AM, Star Tan wrote: >>> Hi Tobias, >> >>Hi Star, >> >>thanks for the update. I copied the polly mailing list, as I believe >>this discussion is very valuable. >> >>> I tried to dig into oggenc.ll, one of the two cases we had for >>> scop-detect compile-time overhead, but I may need your help. >>> >>> First, we cannot simply reduce code size by changing part of function >>> defines to declares. Experimental results shows that the relative >>> scop-detect time always goes down as we reduce the code size. That is >>> similar to what you have pointed out in your previous mail. >>> >>> For example, if we divide the oggen.ll into oggen-top-half.ll (replace >>> all function defines into declares in the second half) and >>> oggen-bottom-half.ll (replace all function defines into declares in the >>> first half) , then their compile-time would be: >>> >>> //Data explain: total time is 4.428s, the absolute time of Polly-detect >>> is 0.8780s and its relative compile time percentage is 20.1%. >>> oggenc.ll: Total(4.428s), Polly-Detect(0.8780s, 20.1%) >>> oggenc-top-half.ll: Total(2.680s), Polly-Detect(0.3828s, 14.2%) >>> oggenc-bottom-half.ll: Total(1.552s), Polly-Detect(0.2608s, 16.9%) >>> >>> To investigate the relationship between the relative scop-detect time and >>> source code size, I also evaluated some big files based on oggenc.ll. >>> For example, we can generate a twice bigger file oggenc2.ll with the >>> following command: >>> cat oggenc.ll oggenc.ll > oggenc2.ll //write two copies >>> of oggen.ll into oggen2.ll >>> '<,'> s/define \(.*\)@/define \1 at t2_ //replace all defines >>> in the second half to avoid redefinition >>> '<,'> s/declare \(.*\)@/declare \1 at t2_ //replace all declares >>> in the second half to avoid redefinition >>> Similarly, we can generate oggenc4.ll (four copies of oggenc.ll) and >>> oggen8.ll (eight copies of oggenc.ll). Their compile-time would be: >>> >>> oggenc: Total( 4.428s), Polly-Detect(0.8780s, 20.1%) >>> oggenc*2: Total(10.204s), Polly-Detect(2.8121s, 27.8%) >>> oggenc*4: Total(24.748s), Polly-Detect(9.9507s, 40.4%) >>> oggenc*8: Total(69.808s), Polly-Detect(39.4136, 56.6%) >>> >>> Results show that the percentage of scop-detect compile time is >>> significantly increased as the code size is getting bigger. This can also >>> explain why our two cases oggenc.ll (5.9M) and tramp3d-v4.ll(19M) are both >>> big file. >> >>Wow, this is an extremely interesting finding. >> >>> Second, I still have not found the reason why the relative scop-detect >>> compile time percentage is significantly increased as the compiled code size >>> is getting bigger. >>> >>> My initial idea is that the global variable "FunctionSet >>> InvalidFunctions" (declared in include/polly/ScopDetection.h) may increase >>> compile-time because it always contains all functions. Since the variable >>> InvalidFunctions never releases its resource, the InvalidFunctions.count() >>> operation in ScopDetection.cpp would increases when more function pointers >>> are put into this Set container. Based on this idea, I attached a patch file >>> to fix this problem. Unfortunately, results show that the compile-time of >>> Polly-detect pass has no change with this patch file. >>> >>> I doubt that the key should be some scop-detect operations, for which the >>> compile-time would increase as the code size or the number of functions >>> increase, but I have not found such operations. Do you have any suggestions? >> >>Before we write a patch, we should do some profiling to understand where >>the overhead comes from. I propose you generate oggenc*16 or even >>oggen*32 to ensure we get to about 90% Polly-Detect overhead. >> >>I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>...' and then 'perf report'. If we are lucky, this points us exactly to >>the function we spend all the time in. >> >>Cheers, >>Tobias > > Thanks for your very useful suggestion. > I have profiled the oggenc*16 and oggenc*32 and the results are listed as > follows: > > oggenc*16: polly-detect compile-time percentage is 71.3%. The top five > functions reported by perf are: > 48.97% opt opt [.] > llvm::TypeFinder::run(llvm::Module const&, bool) > 7.43% opt opt [.] > llvm::TypeFinder::incorporateType(llvm::Type*) > 7.36% opt opt [.] > llvm::TypeFinder::incorporateValue(llvm::Value const*) > 4.04% opt libc-2.17.so [.] 0x0000000000138bea > 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a > > oggenc*32: polly-detect compile-time percentage is 82.9%. The top five > functions reported by perf are: > 57.44% opt opt [.] > llvm::TypeFinder::run(llvm::Module const&, bool) > 11.51% opt opt [.] > llvm::TypeFinder::incorporateType(llvm::Type*) > 7.54% opt opt [.] > llvm::TypeFinder::incorporateValue(llvm::Value const*) > 2.66% opt libc-2.17.so [.] 0x0000000000138c02 > 2.26% opt opt [.] > llvm::SlotTracker::processModule() > > It is surprise that all compile-time for TypeFinder is added into the > compile-time for Polly-detect, but I cannot find the any call instructions > to TypeFinder in Polly-detect. > > Do you have further suggestion? > You can try to set a breakpoint with gdb on llvm::TypeFinder::run and then see the backtrace to figure out where this is called from Polly. Thanks, Sebastian > Best wishes! > Star Tan > > -- > You received this message because you are subscribed to the Google Groups > "Polly Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to polly-dev+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > From nrotem at apple.com Sat Jul 13 23:30:33 2013 From: nrotem at apple.com (Nadav Rotem) Date: Sat, 13 Jul 2013 23:30:33 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 Message-ID: Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements (below) I would like to enable the SLP-vectorizer by default on -O3. I would like to hear what others in the community think about this and give other people the opportunity to perform their own performance measurements. — Performance Gains — SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% MultiSource/Benchmarks/Olden/power/power -18.55% MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.71% SingleSource/Benchmarks/Misc/flops-6 -11.02% SingleSource/Benchmarks/Misc/flops-5 -10.03% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.37% External/Nurbs/nurbs -7.98% SingleSource/Benchmarks/Misc/pi -7.29% External/SPEC/CINT2000/252_eon/252_eon -5.78% External/SPEC/CFP2006/444_namd/444_namd -4.52% External/SPEC/CFP2000/188_ammp/188_ammp -4.45% MultiSource/Applications/SIBsim4/SIBsim4 -3.58% MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl -2.75% MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.95% SingleSource/Benchmarks/Misc/flops -1.89% SingleSource/Benchmarks/Misc/oourafft -1.71% MultiSource/Benchmarks/mafft/pairlocalalign -1.16% External/SPEC/CFP2006/447_dealII/447_dealII -1.06% — Regressions — MultiSource/Benchmarks/Olden/bh/bh 22.47% MultiSource/Benchmarks/Bullet/bullet 7.31% SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% SingleSource/Benchmarks/SmallPT/smallpt 3.91% Thanks, Nadav From chandlerc at google.com Sun Jul 14 00:07:50 2013 From: chandlerc at google.com (Chandler Carruth) Date: Sun, 14 Jul 2013 00:07:50 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: Message-ID: Cool! What changes have you seen to generated code size? I'll take it for a spin on our benchmarks. On Sat, Jul 13, 2013 at 11:30 PM, Nadav Rotem wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent > instructions in a straight-line code. It is currently not enabled by > default, and people who want to experiment with it can use the clang > command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and > without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). > Based on my performance measurements (below) I would like to enable the > SLP-vectorizer by default on -O3. I would like to hear what others in the > community think about this and give other people the opportunity to perform > their own performance measurements. > > — Performance Gains — > SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% > MultiSource/Benchmarks/Olden/power/power -18.55% > MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.71% > SingleSource/Benchmarks/Misc/flops-6 -11.02% > SingleSource/Benchmarks/Misc/flops-5 -10.03% > MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt > -8.37% > External/Nurbs/nurbs -7.98% > SingleSource/Benchmarks/Misc/pi -7.29% > External/SPEC/CINT2000/252_eon/252_eon -5.78% > External/SPEC/CFP2006/444_namd/444_namd -4.52% > External/SPEC/CFP2000/188_ammp/188_ammp -4.45% > MultiSource/Applications/SIBsim4/SIBsim4 -3.58% > MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% > SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% > MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl > -2.75% > MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% > MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.95% > SingleSource/Benchmarks/Misc/flops -1.89% > SingleSource/Benchmarks/Misc/oourafft -1.71% > MultiSource/Benchmarks/mafft/pairlocalalign -1.16% > External/SPEC/CFP2006/447_dealII/447_dealII -1.06% > > — Regressions — > MultiSource/Benchmarks/Olden/bh/bh 22.47% > MultiSource/Benchmarks/Bullet/bullet 7.31% > SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% > SingleSource/Benchmarks/SmallPT/smallpt 3.91% > > Thanks, > Nadav > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nrotem at apple.com Sun Jul 14 00:09:51 2013 From: nrotem at apple.com (Nadav Rotem) Date: Sun, 14 Jul 2013 00:09:51 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: Message-ID: <9C9945BA-2130-44D0-A672-4A16944C4E35@apple.com> > > What changes have you seen to generated code size? > I did not measure code size. > I'll take it for a spin on our benchmarks. > Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Sun Jul 14 01:44:26 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Sun, 14 Jul 2013 14:14:26 +0530 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: <3295DE4F-3F83-441F-AD1B-6656015AD700@pahtak.org> References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> <3295DE4F-3F83-441F-AD1B-6656015AD700@pahtak.org> Message-ID: Stephen Checkoway wrote: > [...] Thanks for the absolutely splendid analysis! > For the memory, immediate form without the suffix, it seems like the options are > 1. If the immediate value is in [0,15], use btsl/btrl since it saves a byte, otherwise error; > 2. Follow both gas's behavior and the Solaris assembler manual Jim Grosbach linked to which stated that unsuffixed instructions are assumed to be long and alias bts to btsl and btr to btrl; or > 3. Always error even when there is no ambiguity. > > I have no opinion on which option LLVM should follow. Okay, so while digging through the history of the linux.git tree, I found this. The patch _replaces_ btrl instructions with btr instructions, for seemingly good reason. What is your opinion on the issue? commit 1c54d77078056cde0f195b1a982cb681850efc08 Author: Jeremy Fitzhardinge Date: 5 years ago x86: partial unification of asm-x86/bitops.h This unifies the set/clear/test bit functions of asm/bitops.h. I have not attempted to merge the bit-finding functions, since they rely on the machine word size and can't be easily restructured to work generically without a lot of #ifdefs. In particular, the 64-bit code can assume the presence of conditional move instructions, whereas 32-bit needs to be more careful. The inline assembly for the bit operations has been changed to remove explicit sizing hints on the instructions, so the assembler will pick the appropriate instruction forms depending on the architecture and the context. Signed-off-by: Jeremy Fitzhardinge Cc: Andi Kleen Cc: Linus Torvalds Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner From tanmx_star at yeah.net Sun Jul 14 02:26:19 2013 From: tanmx_star at yeah.net (Star Tan) Date: Sun, 14 Jul 2013 17:26:19 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51E2352A.6090706@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> Message-ID: <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> At 2013-07-14 13:20:42,"Tobias Grosser" wrote: >On 07/13/2013 09:18 PM, Star Tan wrote: >> >> >> At 2013-07-14 02:30:07,"Tobias Grosser" wrote: >>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>> Hi Tobias, >>> >>> Hi Star, >[...] >>> Before we write a patch, we should do some profiling to understand where >>> the overhead comes from. I propose you generate oggenc*16 or even >>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>> >>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>> the function we spend all the time in. >>> >>> Cheers, >>> Tobias >> >> Thanks for your very useful suggestion. >> I have profiled the oggenc*16 and oggenc*32 and the results are listed as follows: >> >> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five functions reported by perf are: >> 48.97% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 7.43% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.36% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >> >> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five functions reported by perf are: >> 57.44% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 11.51% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.54% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >> 2.26% opt opt [.] llvm::SlotTracker::processModule() >> >> It is surprise that all compile-time for TypeFinder is added into the compile-time for Polly-detect, but I cannot find the any call instructions to TypeFinder in Polly-detect. > >Yes, this does not seem very conclusive. We probably need a call graph >to see where those are called. > >Did you try running 'perf record' with the '-g' option? This should give >you callgraph information, that should be very helpful to track down the >callers in Polly. Also, if you prefer a graphical view of the >results, you may want to have a look at Gprof2Dot [1]. Finally, if this >all does not work, just running Polly in gdb and randomly breaking a >couple of times (manual sampling), may possibly hint you to the right place. > I also tried perf with -g, but it report nothing useful. the result of perf -g is: - 48.70% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) ` - llvm::TypeFinder::run(llvm::Module const&, bool) + 43.34% 0 - 1.78% 0x480031 + llvm::LoadInst::~LoadInst() - 1.41% 0x460031 + llvm::LoadInst::~LoadInst() - 1.01% 0x18 llvm::BranchInst::~BranchInst() 0x8348007d97fa3d8d - 0.87% 0x233 + llvm::GetElementPtrInst::~GetElementPtrInst() - 0.57% 0x39 + llvm::SExtInst::~SExtInst() - 0.54% 0x460032 + llvm::StoreInst::~StoreInst() GDB is a useful tool! Thanks for Sebastian's advice! By setting a break point on llvm::TypeFinder::run(llvm::Module const&, bool), I find most of calling cases are issued from the following two callsites: 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () The detailed backtrace of "isValidMemoryAccess" is: #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) () #2 0x08f76fc9 in llvm::AssemblyWriter::init() () #3 0x08f77176 in llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*) const () #5 0xb7c1d044 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #6 0xb7c1ea75 in polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #7 0xb7c1f4aa in polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () #14 0x081e6040 in main () >Also, can you upload the .ll file somewhere, such that I can access it? >(Please do not attach it to the email) I have attached the source code of oggenc.c and oggen.ll in the bug r16624: http://llvm.org/bugs/show_bug.cgi?id=16624 Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Sun Jul 14 02:29:22 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Sun, 14 Jul 2013 09:29:22 +0000 Subject: [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts In-Reply-To: References: <1373484576-20213-1-git-send-email-artagnon@gmail.com> <416B65E5-1813-4904-BBC0-882AD2F17712@apple.com> <32EF0723-5EE8-4E15-B3DD-482C0EEECFD9@apple.com> <9F4C233E-884F-4D63-9C48-6E7A9DBCEE56@apple.com> <3295DE4F-3F83-441F-AD1B-6656015AD700@pahtak.org> Message-ID: > The patch _replaces_ btrl instructions with btr > instructions, for seemingly good reason. What is your opinion on the > issue? Mine is it would be a sympathetic reason if correct, but not good if the instructions shouldn't exist in the first place. However: (From the commit message): > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. It doesn't do this, as far as I can tell. I cannot make the unsuffixed versions do anything except an 'l' access in either gas or gcc, regardless of architecture, pointer size or anything else. What the code actually does is produce "btsl $imm, addr" if 0 <= imm <=31 (at compile-time) and "btsl %eax, addr" otherwise, which works even for 64-bit types (by Stephen's analysis), but involves an extra "movl $imm, %eax". >From the commit message, it seems like they *do* intend it to produce "btslq $imm, addr" where possible and might well be open to fixing this as a bug (that just happens to help Linux compile with Clang) rather than purely for toolchain compatibility. That said, it's not clear how to enable both forms to be generated easily from inline asm, but I'm not an expert. Tim. From ismail at donmez.ws Sun Jul 14 05:39:16 2013 From: ismail at donmez.ws (=?UTF-8?B?xLBzbWFpbCBEw7ZubWV6?=) Date: Sun, 14 Jul 2013 14:39:16 +0200 Subject: [LLVMdev] Multiple failures on PowerPC64 In-Reply-To: References: Message-ID: Hi, On Sat, Jul 13, 2013 at 6:03 PM, Tim Northover wrote: > Hi, > > > LLVM :: ExecutionEngine/test-shift.ll > > > > Any ideas? > > Those tests are for the old JIT, and shouldn't be run at all on > PowerPC. The host_arch field in your lit.site.cfg (somewhere in the > build directory) is probably being set incorrectly. > > I've seen this when LLVM is cross-compiled with CMake before. I had to > set CMAKE_SYSTEM_PROCESSOR in my toolchain file (in your case to > "PowerPC"). But your situation could be completely different, of > course. > > So are you using autotools or CMake? And with what options? > Using cmake, looks like it sets host_arch as ppc64 and hence the lit fails to parse it as PowerPC I guess. Will do more debugging and report. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From legalize at xmission.com Sun Jul 14 06:55:29 2013 From: legalize at xmission.com (Richard) Date: Sun, 14 Jul 2013 07:55:29 -0600 Subject: [LLVMdev] Windows reviewers still needed? Message-ID: I watched some LLVM videos yesterday and there was one where Chandler Carruth was saying that LLVM/clang needed more Windows savvy developers to review patches. Is this still the case? If so, how do I contribute? -- "The Direct3D Graphics Pipeline" free book The Computer Graphics Museum The Terminals Wiki Legalize Adulthood! (my blog) From sebpop at gmail.com Sun Jul 14 08:01:18 2013 From: sebpop at gmail.com (Sebastian Pop) Date: Sun, 14 Jul 2013 10:01:18 -0500 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> Message-ID: Tobi, it looks like this code is the problem: for (std::vector::iterator PI = Pointers.begin(), PE = Pointers.end(); ;) { Value *V = *PI; if (V->getName().size() == 0) OS << "\"" << *V << "\""; else OS << "\"" << V->getName() << "\""; ++PI; if (PI != PE) OS << ", "; else break; } INVALID_NOVERIFY(Alias, OS.str()); it prints values to OS even when debug is not turned on: if you remember, I have already sent a patch to conditionally execute this code in a DEBUG() stmt. You rejected that patch because it would not work for other reporting tools -polly-show or some other flags. On Sun, Jul 14, 2013 at 4:26 AM, Star Tan wrote: > > At 2013-07-14 13:20:42,"Tobias Grosser" wrote: >>On 07/13/2013 09:18 PM, Star Tan wrote: >>> >>> >>> At 2013-07-14 02:30:07,"Tobias Grosser" wrote: >>>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>>> Hi Tobias, >>>> >>>> Hi Star, >>[...] >>>> Before we write a patch, we should do some profiling to understand where >>>> the overhead comes from. I propose you generate oggenc*16 or even >>>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>>> >>>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>>> the function we spend all the time in. >>>> >>>> Cheers, >>>> Tobias >>> >>> Thanks for your very useful suggestion. >>> I have profiled the oggenc*16 and oggenc*32 and the results are listed as >>> follows: >>> >>> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five >>> functions reported by perf are: >>> 48.97% opt opt [.] >>> llvm::TypeFinder::run(llvm::Module const&, bool) >>> 7.43% opt opt [.] >>> llvm::TypeFinder::incorporateType(llvm::Type*) >>> 7.36% opt opt [.] >>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >>> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >>> >>> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five >>> functions reported by perf are: >>> 57.44% opt opt [.] >>> llvm::TypeFinder::run(llvm::Module const&, bool) >>> 11.51% opt opt [.] >>> llvm::TypeFinder::incorporateType(llvm::Type*) >>> 7.54% opt opt [.] >>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >>> 2.26% opt opt [.] >>> llvm::SlotTracker::processModule() >>> >>> It is surprise that all compile-time for TypeFinder is added into the >>> compile-time for Polly-detect, but I cannot find the any call instructions >>> to TypeFinder in Polly-detect. >> >>Yes, this does not seem very conclusive. We probably need a call graph >>to see where those are called. >> >>Did you try running 'perf record' with the '-g' option? This should give >>you callgraph information, that should be very helpful to track down the >>callers in Polly. Also, if you prefer a graphical view of the >>results, you may want to have a look at Gprof2Dot [1]. Finally, if this >>all does not work, just running Polly in gdb and randomly breaking a >>couple of times (manual sampling), may possibly hint you to the right >> place. >> > > I also tried perf with -g, but it report nothing useful. the result of perf > -g is: > - 48.70% opt opt [.] > llvm::TypeFinder::run(llvm::Module const&, bool) > ` > > - llvm::TypeFinder::run(llvm::Module const&, bool) > + 43.34% 0 > - 1.78% 0x480031 > + llvm::LoadInst::~LoadInst() > - 1.41% 0x460031 > + llvm::LoadInst::~LoadInst() > - 1.01% 0x18 > llvm::BranchInst::~BranchInst() > 0x8348007d97fa3d8d > - 0.87% 0x233 > + llvm::GetElementPtrInst::~GetElementPtrInst() > - 0.57% 0x39 > + llvm::SExtInst::~SExtInst() > - 0.54% 0x460032 > + llvm::StoreInst::~StoreInst() > > > GDB is a useful tool! Thanks for Sebastian's advice! > > By setting a break point on llvm::TypeFinder::run(llvm::Module const&, > bool), I find most of calling cases are issued from the following two > callsites: > 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, > polly::ScopDetection::DetectionContext&) const () > 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, > polly::ScopDetection::DetectionContext&) const () > > The detailed backtrace of "isValidMemoryAccess" is: > #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () > #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) > () > #2 0x08f76fc9 in llvm::AssemblyWriter::init() () > #3 0x08f77176 in > llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, > llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () > #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, > llvm::AssemblyAnnotationWriter*) const () > #5 0xb7c1d044 in > polly::ScopDetection::isValidInstruction(llvm::Instruction&, > polly::ScopDetection::DetectionContext&) const () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #6 0xb7c1ea75 in > polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) > const () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #7 0xb7c1f4aa in > polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) > const () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () > from > /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so > #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () > #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () > #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () > #14 0x081e6040 in main () > > > > >Also, can you upload the .ll file somewhere, such that I can access it? >>(Please do not attach it to the email) > > I have attached the source code of oggenc.c and oggen.ll in the bug r16624: > http://llvm.org/bugs/show_bug.cgi?id=16624 > > Best wishes, > Star Tan > > > > -- > You received this message because you are subscribed to the Google Groups > "Polly Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to polly-dev+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > From anton at korobeynikov.info Sun Jul 14 08:04:39 2013 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Sun, 14 Jul 2013 19:04:39 +0400 Subject: [LLVMdev] Windows reviewers still needed? In-Reply-To: References: Message-ID: > Is this still the case? If so, how do I contribute? Simply review windows patches, no? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From tanmx_star at yeah.net Sun Jul 14 08:05:10 2013 From: tanmx_star at yeah.net (Star Tan) Date: Sun, 14 Jul 2013 23:05:10 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> Message-ID: <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. However, this patch file does not solve this problem. It only shows the reason why polly-detect pass leads to significant compile-time overhead. Best wishes, Star Tan. At 2013-07-14 17:26:19,"Star Tan" wrote: At 2013-07-14 13:20:42,"Tobias Grosser" wrote: >On 07/13/2013 09:18 PM, Star Tan wrote: >> >> >> At 2013-07-14 02:30:07,"Tobias Grosser" wrote: >>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>> Hi Tobias, >>> >>> Hi Star, >[...] >>> Before we write a patch, we should do some profiling to understand where >>> the overhead comes from. I propose you generate oggenc*16 or even >>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>> >>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>> the function we spend all the time in. >>> >>> Cheers, >>> Tobias >> >> Thanks for your very useful suggestion. >> I have profiled the oggenc*16 and oggenc*32 and the results are listed as follows: >> >> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five functions reported by perf are: >> 48.97% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 7.43% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.36% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >> >> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five functions reported by perf are: >> 57.44% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 11.51% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.54% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >> 2.26% opt opt [.] llvm::SlotTracker::processModule() >> >> It is surprise that all compile-time for TypeFinder is added into the compile-time for Polly-detect, but I cannot find the any call instructions to TypeFinder in Polly-detect. > >Yes, this does not seem very conclusive. We probably need a call graph >to see where those are called. > >Did you try running 'perf record' with the '-g' option? This should give >you callgraph information, that should be very helpful to track down the >callers in Polly. Also, if you prefer a graphical view of the >results, you may want to have a look at Gprof2Dot [1]. Finally, if this >all does not work, just running Polly in gdb and randomly breaking a >couple of times (manual sampling), may possibly hint you to the right place. > I also tried perf with -g, but it report nothing useful. the result of perf -g is: - 48.70% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) ` - llvm::TypeFinder::run(llvm::Module const&, bool) + 43.34% 0 - 1.78% 0x480031 + llvm::LoadInst::~LoadInst() - 1.41% 0x460031 + llvm::LoadInst::~LoadInst() - 1.01% 0x18 llvm::BranchInst::~BranchInst() 0x8348007d97fa3d8d - 0.87% 0x233 + llvm::GetElementPtrInst::~GetElementPtrInst() - 0.57% 0x39 + llvm::SExtInst::~SExtInst() - 0.54% 0x460032 + llvm::StoreInst::~StoreInst() GDB is a useful tool! Thanks for Sebastian's advice! By setting a break point on llvm::TypeFinder::run(llvm::Module const&, bool), I find most of calling cases are issued from! the following two callsites: 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () The detailed backtrace of "isValidMemoryAccess" is: #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) () #2 0x08f76fc9 in llvm::AssemblyWriter::init() () #3 0x08f77176 in llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*) const () #5 0xb7c1d044 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContex! t&) const () from /home/star/llvm/llvm_build/tools/polly/Rel ease+Asserts/lib/LLVMPolly.so #6 0xb7c1ea75 in polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #7 0xb7c1f4aa in polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () #12 0x0! 9067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () #14 0x081e6040 in main () >Also, can you upload the .ll file somewhere, such that I can access it? >(Please do not attach it to the email) I have attached the source code of oggenc.c and oggen.ll in the bug r16624: http://llvm.org/bugs/show_bug.cgi?id=16624 Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hack_for_polly_detect.patch Type: application/octet-stream Size: 3915 bytes Desc: not available URL: From tanmx_star at yeah.net Sun Jul 14 08:17:01 2013 From: tanmx_star at yeah.net (Star Tan) Date: Sun, 14 Jul 2013 23:17:01 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> Message-ID: <67f0dcea.469f.13fddc19f2e.Coremail.tanmx_star@yeah.net> Hi Sebastian, Yes, you have pointed an important reason. If we comment this source code you have listed, then the compile-time overhead for oggenc*8.ll can be reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%). I just sent another mail to explain why polly-detect pass leads to significant compile-time overhead. Besides the reason you have pointed, another reason is resulted from those string buffer operations in "INVALID" MACRO. If we comment both the string buffer operations in "INVALID" MACRO and in the "isValidMemoryAccess" function, the compile-time overhead for oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%). I think we should revise these string buffer operations in Polly-detect pass. Best wishes, Star Tan At 2013-07-14 23:01:18,"Sebastian Pop" wrote: >Tobi, > >it looks like this code is the problem: > > for (std::vector::iterator PI = Pointers.begin(), > PE = Pointers.end(); > ;) { > Value *V = *PI; > > if (V->getName().size() == 0) > OS << "\"" << *V << "\""; > else > OS << "\"" << V->getName() << "\""; > > ++PI; > > if (PI != PE) > OS << ", "; > else > break; > } > > INVALID_NOVERIFY(Alias, OS.str()); > >it prints values to OS even when debug is not turned on: >if you remember, I have already sent a patch to conditionally execute this code >in a DEBUG() stmt. You rejected that patch because it would not work for >other reporting tools -polly-show or some other flags. > > >On Sun, Jul 14, 2013 at 4:26 AM, Star Tan wrote: >> >> At 2013-07-14 13:20:42,"Tobias Grosser" wrote: >>>On 07/13/2013 09:18 PM, Star Tan wrote: >>>> >>>> >>>> At 2013-07-14 02:30:07,"Tobias Grosser" wrote: >>>>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>>>> Hi Tobias, >>>>> >>>>> Hi Star, >>>[...] >>>>> Before we write a patch, we should do some profiling to understand where >>>>> the overhead comes from. I propose you generate oggenc*16 or even >>>>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>>>> >>>>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>>>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>>>> the function we spend all the time in. >>>>> >>>>> Cheers, >>>>> Tobias >>>> >>>> Thanks for your very useful suggestion. >>>> I have profiled the oggenc*16 and oggenc*32 and the results are listed as >>>> follows: >>>> >>>> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five >>>> functions reported by perf are: >>>> 48.97% opt opt [.] >>>> llvm::TypeFinder::run(llvm::Module const&, bool) >>>> 7.43% opt opt [.] >>>> llvm::TypeFinder::incorporateType(llvm::Type*) >>>> 7.36% opt opt [.] >>>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>>> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >>>> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >>>> >>>> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five >>>> functions reported by perf are: >>>> 57.44% opt opt [.] >>>> llvm::TypeFinder::run(llvm::Module const&, bool) >>>> 11.51% opt opt [.] >>>> llvm::TypeFinder::incorporateType(llvm::Type*) >>>> 7.54% opt opt [.] >>>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>>> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >>>> 2.26% opt opt [.] >>>> llvm::SlotTracker::processModule() >>>> >>>> It is surprise that all compile-time for TypeFinder is added into the >>>> compile-time for Polly-detect, but I cannot find the any call instructions >>>> to TypeFinder in Polly-detect. >>> >>>Yes, this does not seem very conclusive. We probably need a call graph >>>to see where those are called. >>> >>>Did you try running 'perf record' with the '-g' option? This should give >>>you callgraph information, that should be very helpful to track down the >>>callers in Polly. Also, if you prefer a graphical view of the >>>results, you may want to have a look at Gprof2Dot [1]. Finally, if this >>>all does not work, just running Polly in gdb and randomly breaking a >>>couple of times (manual sampling), may possibly hint you to the right >>> place. >>> >> >> I also tried perf with -g, but it report nothing useful. the result of perf >> -g is: >> - 48.70% opt opt [.] >> llvm::TypeFinder::run(llvm::Module const&, bool) >> ` >> >> - llvm::TypeFinder::run(llvm::Module const&, bool) >> + 43.34% 0 >> - 1.78% 0x480031 >> + llvm::LoadInst::~LoadInst() >> - 1.41% 0x460031 >> + llvm::LoadInst::~LoadInst() >> - 1.01% 0x18 >> llvm::BranchInst::~BranchInst() >> 0x8348007d97fa3d8d >> - 0.87% 0x233 >> + llvm::GetElementPtrInst::~GetElementPtrInst() >> - 0.57% 0x39 >> + llvm::SExtInst::~SExtInst() >> - 0.54% 0x460032 >> + llvm::StoreInst::~StoreInst() >> >> >> GDB is a useful tool! Thanks for Sebastian's advice! >> >> By setting a break point on llvm::TypeFinder::run(llvm::Module const&, >> bool), I find most of calling cases are issued from the following two >> callsites: >> 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> >> The detailed backtrace of "isValidMemoryAccess" is: >> #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () >> #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) >> () >> #2 0x08f76fc9 in llvm::AssemblyWriter::init() () >> #3 0x08f77176 in >> llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, >> llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () >> #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, >> llvm::AssemblyAnnotationWriter*) const () >> #5 0xb7c1d044 in >> polly::ScopDetection::isValidInstruction(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #6 0xb7c1ea75 in >> polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) >> const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #7 0xb7c1f4aa in >> polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) >> const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () >> #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () >> #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () >> #14 0x081e6040 in main () >> >> >> >> >Also, can you upload the .ll file somewhere, such that I can access it? >>>(Please do not attach it to the email) >> >> I have attached the source code of oggenc.c and oggen.ll in the bug r16624: >> http://llvm.org/bugs/show_bug.cgi?id=16624 >> >> Best wishes, >> Star Tan >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Polly Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to polly-dev+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebpop at gmail.com Sun Jul 14 08:30:31 2013 From: sebpop at gmail.com (Sebastian Pop) Date: Sun, 14 Jul 2013 10:30:31 -0500 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <67f0dcea.469f.13fddc19f2e.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <67f0dcea.469f.13fddc19f2e.Coremail.tanmx_star@yeah.net> Message-ID: On Sun, Jul 14, 2013 at 10:17 AM, Star Tan wrote: > Hi Sebastian, > > Yes, you have pointed an important reason. If we comment this source code > you have listed, then the compile-time overhead for oggenc*8.ll can be > reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%). > > I just sent another mail to explain why polly-detect pass leads to > significant compile-time overhead. Besides the reason you have pointed, > another reason is resulted from those string buffer operations in "INVALID" > MACRO. If we comment both the string buffer operations in "INVALID" MACRO > and in the "isValidMemoryAccess" function, the compile-time overhead for > oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%). Awesome, thanks for the analysis. Can you run again perf on the resulting program: I would still like to understand where we spend the 5.88s in the rest of scop detection. > I think we should revise these string buffer operations in Polly-detect > pass. Right: we should incur no overhead in a non debug build. Thanks, Sebastian From ofv at wanadoo.es Sun Jul 14 08:41:29 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Sun, 14 Jul 2013 17:41:29 +0200 Subject: [LLVMdev] Windows reviewers still needed? References: Message-ID: <87bo65f8t2.fsf@wanadoo.es> Richard writes: > I watched some LLVM videos yesterday and there was one where Chandler > Carruth was saying that LLVM/clang needed more Windows savvy > developers to review patches. > > Is this still the case? If so, how do I contribute? Just to be more verbose that Anton... :-) Subscribe to llvm-commits and cfe-commits mailing lists, watch for patches touching Windows functionality and reply to them with your comments. From tanmx_star at yeah.net Sun Jul 14 08:50:27 2013 From: tanmx_star at yeah.net (Star Tan) Date: Sun, 14 Jul 2013 23:50:27 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <67f0dcea.469f.13fddc19f2e.Coremail.tanmx_star@yeah.net> Message-ID: <70b11c2b.4788.13fdde03a3e.Coremail.tanmx_star@yeah.net> At 2013-07-14 23:30:31,"Sebastian Pop" wrote: >On Sun, Jul 14, 2013 at 10:17 AM, Star Tan wrote: >> Hi Sebastian, >> >> Yes, you have pointed an important reason. If we comment this source code >> you have listed, then the compile-time overhead for oggenc*8.ll can be >> reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%). >> >> I just sent another mail to explain why polly-detect pass leads to >> significant compile-time overhead. Besides the reason you have pointed, >> another reason is resulted from those string buffer operations in "INVALID" >> MACRO. If we comment both the string buffer operations in "INVALID" MACRO >> and in the "isValidMemoryAccess" function, the compile-time overhead for >> oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%). > >Awesome, thanks for the analysis. > >Can you run again perf on the resulting program: I would still like to >understand >where we spend the 5.88s in the rest of scop detection. > The top ten functions reported by perf are: + 12.68% opt [kernel.kallsyms] [k] 0xc111472c + 9.40% opt libc-2.17.so [.] 0x0007875f + 2.98% opt opt [.] __x86.get_pc_thunk.bx + 2.42% opt [vdso] [.] 0x00000425 + 1.46% opt libgmp.so.10.0.5 [.] __gmpn_copyi_x86 + 1.11% opt libc-2.17.so [.] free + 1.07% opt opt [.] bool llvm::DenseMapBase + 1.02% opt opt [.] llvm::ComputeMaskedBits(llvm::Value*, llvm::APInt&, llvm::APInt&, llvm::DataLayout const*, unsigna + 1.00% opt opt [.] llvm::Use::getImpliedUser() const + 0.90% opt libgmp.so.10.0.5 [.] __gmpz_set + 0.76% opt opt [.] llvm::SmallPtrSetImpl::insert_imp(void const*) + 0.74% opt opt [.] llvm::InstCombiner::DoOneIteration(llvm::Function&, unsigned int) + 0.73% opt opt [.] llvm::PassRegistry::getPassInfo(void const*) const + 0.72% opt libc-2.17.so [.] malloc + 0.72% opt opt [.] llvm::TimeRecord::getCurrentTime(bool) + 0.71% opt opt [.] llvm::ValueHandleBase::AddToUseList() + 0.57% opt opt [.] llvm::SlotTracker::processModule() + 0.52% opt opt [.] llvm::PMTopLevelManager::findAnalysisPass(void const*) + 0.51% opt opt [.] llvm::APInt::~APInt() + 0.51% opt libgmp.so.10.0.5 [.] __gmpz_mul Unfortunately, I cannot set breakpoints for the top 2 functions. Even with "perf -g", I still cannot track where time is spent on in Polly-detect pass. The "perf -g" results are like this: - 12.68% opt [kernel.kallsyms] [k] 0xc111472c ` - 0xc161984a a - 0xb7783424 a - 99.93% 0x9ab2000 a + 85.99% 0 a + 3.14% 0xb46d018 a + 1.42% 0xb7745000 a - 0xc1612415 a + 97.76% 0xc1062977 a + 1.75% 0xc1495888 Do you have some suggestions? Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Sun Jul 14 05:56:02 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Sun, 14 Jul 2013 18:26:02 +0530 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix Message-ID: <1373806562-30422-1-git-send-email-artagnon@gmail.com> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) changed a bunch of btrl/btsl instructions to btr/bts, with the following justification: The inline assembly for the bit operations has been changed to remove explicit sizing hints on the instructions, so the assembler will pick the appropriate instruction forms depending on the architecture and the context. Unfortunately, GNU as does no such thing, and the AT&T syntax manual [1] contains no references to any such inference. As evidenced by the following experiment, gas always disambiguates btr/bts to btrl/btsl. Feed the following input to gas: btrl $1, 0 btr $1, 0 btsl $1, 0 bts $1, 0 Check that btr matches btrl, and bts matches btsl in both cases: $ as --32 -a in.s $ as --64 -a in.s To avoid giving readers the illusion of such an inference, and for clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to disambiguate btr/bts automatically. [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf Cc: Jeremy Fitzhardinge Cc: Andi Kleen Cc: Linus Torvalds Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Eli Friedman Cc: Jim Grosbach Cc: Stephen Checkoway Cc: LLVMdev Signed-off-by: Ramkumar Ramachandra --- We discussed this pretty extensively on LLVMDev, but I'm still not sure that I haven't missed something. arch/x86/include/asm/bitops.h | 16 ++++++++-------- arch/x86/include/asm/percpu.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 6dfd019..6ed3d1e 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) : "iq" ((u8)CONST_MASK(nr)) : "memory"); } else { - asm volatile(LOCK_PREFIX "bts %1,%0" + asm volatile(LOCK_PREFIX "btsl %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); } } @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) */ static inline void __set_bit(int nr, volatile unsigned long *addr) { - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); } /** @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)~CONST_MASK(nr))); } else { - asm volatile(LOCK_PREFIX "btr %1,%0" + asm volatile(LOCK_PREFIX "btrl %1,%0" : BITOP_ADDR(addr) : "Ir" (nr)); } @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) static inline void __clear_bit(int nr, volatile unsigned long *addr) { - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); } /* @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); return oldbit; @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm("bts %2,%1\n\t" + asm("btsl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr)); @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile("btr %2,%1\n\t" + asm volatile("btrl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr)); diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 0da5200..fda54c9 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -490,7 +490,7 @@ do { \ #define x86_test_and_clear_bit_percpu(bit, var) \ ({ \ int old__; \ - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ : "=r" (old__), "+m" (var) \ : "dIr" (bit)); \ old__; \ -- 1.8.3.2.736.g869de25 From baldrick at free.fr Sun Jul 14 09:39:23 2013 From: baldrick at free.fr (Duncan Sands) Date: Sun, 14 Jul 2013 18:39:23 +0200 Subject: [LLVMdev] Inlined call properly optimized, but not function itself In-Reply-To: <51E0D159.1090402@disemia.com> References: <51E0D159.1090402@disemia.com> Message-ID: <51E2D43B.8010805@free.fr> Hi edA-qa mort-ora-y, > define i64 @eval_expr() uwtable { > entry: > %0 = extractvalue %0 { i1 true, i64 3 }, 0 it does seem odd that this has not been turned into "i1 true". The instcombine pass, which is run many times in the standard optimization pipeline, will get this. Are you running it? Ciao, Duncan. From torvalds at linux-foundation.org Sun Jul 14 10:19:20 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sun, 14 Jul 2013 10:19:20 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <1373806562-30422-1-git-send-email-artagnon@gmail.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: On Sun, Jul 14, 2013 at 5:56 AM, Ramkumar Ramachandra wrote: > 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing Yes it does. > btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0 What the heck is that supposed to show? It shows nothing at all. With an argument of '1', *of*course* gas will use "btsl", since that's the short form. Using the rex-predix and a btsq would be *stupid*. So gas will pick the appropriate form, exactly as claimed. Try some actual relevant test instead: bt %eax,mem bt %rax,mem and notice how they are actually fundamentally different. Test-case: int main(int argc, char **argv) { asm("bt %1,%0":"=m" (**argv): "a" (argc)); asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc))); } and I get 0f a3 02 bt %eax,(%rdx) 48 0f a3 02 bt %rax,(%rdx) exactly as expected and wanted. Now, there are possible cases where you want to make the size explicit because you are mixing memory operand sizes and there can be nasty performance implications of doing a 32-bit write and then doing a 64-bit read of the result. I'm not actually aware of us having ever worried/cared about it, but it's a possible source of trouble: mixing bitop instructions with non-bitop instructions can have some subtle interactions, and you need to be careful, since the size of the operand affects both the offset *and* the memory access size. The access size generally is meaningless from a semantic standpoint (little-endian being the only sane model), but the access size *can* have performance implications for the write queue forwarding. Linus From anton at korobeynikov.info Sun Jul 14 11:24:20 2013 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Sun, 14 Jul 2013 22:24:20 +0400 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: Message-ID: > MultiSource/Benchmarks/Olden/bh/bh 22.47% > MultiSource/Benchmarks/Bullet/bullet 7.31% Looks like quite big regressions. Any idea, why? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From t.p.northover at gmail.com Sun Jul 14 11:35:20 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Sun, 14 Jul 2013 18:35:20 +0000 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: Hi, The issue perhaps wasn't explained ideally (and possibly shouldn't have been CCed directly to you either, so apologies, but now that there *is* a discussion...) > Try some actual relevant test instead: > > bt %eax,mem > bt %rax,mem > > and notice how they are actually fundamentally different. Test-case: I'm coming at this from the compiler side, where the register form is unambiguous and not questioned. The discussion we're having involves only the immediate form of the instruction. GNU as interprets: bt $63, mem as btl $63, mem which may or may not be what the user intended, but is not the same as "btq $63, mem". I'm not an official LLVM spokesperson or anything, but our consensus seems to be that "bt $imm, whatever" is ambiguous (the %eax and %rax versions you quoted disambiguate the width) and should be disallowed by the assembler. The patch we're replying to implements that as a NOP fix to the kernel (GNU as always treats "bt" with an immediate as "btl"). I don't believe there's any situation in which it will produce different code, but it will allow Clang to compile (this part of) the kernel. There is, however, a potential optimisation here for someone who knows their inline asm. Currently "set_bit(63, addr)" will use the "r" version of the constraint even on amd64 targets, materialising 63 with a "movl". With sufficiently clever faff, it could use "btsq" instead. Cheers. Tim. From t.p.northover at gmail.com Sun Jul 14 12:30:00 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Sun, 14 Jul 2013 19:30:00 +0000 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: > And that is why I think you should just consider "bt $x,y" to be > trivially the same thing and not at all ambiguous. Because there is > ABSOLUTELY ZERO ambiguity when people write > > bt $63, mem > > Zero. Nada. None. The semantics are *exactly* the same for btl and btq > in this case, so why would you want the user to specify one or the > other? I don't think you've actually tested that, have you? (x86-64) int main() { long val = 0xffffffff; char res; asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); printf("%d\n", res); asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); printf("%d\n", res); } Tim. From jeremy at goop.org Sun Jul 14 12:41:06 2013 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Sun, 14 Jul 2013 12:41:06 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E2FED2.7010306@goop.org> On 07/14/2013 12:30 PM, Tim Northover wrote: >> And that is why I think you should just consider "bt $x,y" to be >> trivially the same thing and not at all ambiguous. Because there is >> ABSOLUTELY ZERO ambiguity when people write >> >> bt $63, mem >> >> Zero. Nada. None. The semantics are *exactly* the same for btl and btq >> in this case, so why would you want the user to specify one or the >> other? > I don't think you've actually tested that, have you? (x86-64) > > int main() { > long val = 0xffffffff; > char res; > > asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); > printf("%d\n", res); > > asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); > printf("%d\n", res); > } Blerk. It doesn't undermine the original point - that gas can unambiguously choose the right operation size for a constant bit offset - but yes, the operation size is meaningful in the case of a immediate bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2, far from the instructions which use it... J > > Tim. > From jeremy at goop.org Sun Jul 14 12:10:38 2013 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Sun, 14 Jul 2013 12:10:38 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <1373806562-30422-1-git-send-email-artagnon@gmail.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E2F7AE.6030902@goop.org> On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote: > 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing, and the AT&T syntax manual > [1] contains no references to any such inference. As evidenced by the > following experiment, gas always disambiguates btr/bts to btrl/btsl. > Feed the following input to gas: > > btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0 When I originally did those patches, I was careful make sure that we didn't give implied sizes to operations with only immediate and/or memory operands because - in general - gas can't infer the operation size from such operands. However, in the case of the bit test/set operations, the memory access size is not really derived from the operation size (the SDM is a bit vague), and even if it were it would be an operation rather than semantic difference. So there's no real problem with gas choosing 'l' as a default size in the absence of any explicit override or constraint. > Check that btr matches btrl, and bts matches btsl in both cases: > > $ as --32 -a in.s > $ as --64 -a in.s > > To avoid giving readers the illusion of such an inference, and for > clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to > disambiguate btr/bts automatically. That sounds reasonable for all other operations because it makes a real semantic difference, but overly strict for bit operations. J > > [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > Cc: Jeremy Fitzhardinge > Cc: Andi Kleen > Cc: Linus Torvalds > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Eli Friedman > Cc: Jim Grosbach > Cc: Stephen Checkoway > Cc: LLVMdev > Signed-off-by: Ramkumar Ramachandra > --- > We discussed this pretty extensively on LLVMDev, but I'm still not > sure that I haven't missed something. > > arch/x86/include/asm/bitops.h | 16 ++++++++-------- > arch/x86/include/asm/percpu.h | 2 +- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index 6dfd019..6ed3d1e 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > : "iq" ((u8)CONST_MASK(nr)) > : "memory"); > } else { > - asm volatile(LOCK_PREFIX "bts %1,%0" > + asm volatile(LOCK_PREFIX "btsl %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > } > } > @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > */ > static inline void __set_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); > + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); > } > > /** > @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) > : CONST_MASK_ADDR(nr, addr) > : "iq" ((u8)~CONST_MASK(nr))); > } else { > - asm volatile(LOCK_PREFIX "btr %1,%0" > + asm volatile(LOCK_PREFIX "btrl %1,%0" > : BITOP_ADDR(addr) > : "Ir" (nr)); > } > @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) > > static inline void __clear_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); > + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); > } > > /* > @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" > "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > return oldbit; > @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm("bts %2,%1\n\t" > + asm("btsl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile("btr %2,%1\n\t" > + asm volatile("btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index 0da5200..fda54c9 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -490,7 +490,7 @@ do { \ > #define x86_test_and_clear_bit_percpu(bit, var) \ > ({ \ > int old__; \ > - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > : "=r" (old__), "+m" (var) \ > : "dIr" (bit)); \ > old__; \ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at goop.org Sun Jul 14 12:10:42 2013 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Sun, 14 Jul 2013 12:10:42 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E2F7B2.7040608@goop.org> On 07/14/2013 10:19 AM, Linus Torvalds wrote: > Now, there are possible cases where you want to make the size explicit > because you are mixing memory operand sizes and there can be nasty > performance implications of doing a 32-bit write and then doing a > 64-bit read of the result. I'm not actually aware of us having ever > worried/cared about it, but it's a possible source of trouble: mixing > bitop instructions with non-bitop instructions can have some subtle > interactions, and you need to be careful, since the size of the > operand affects both the offset *and* the memory access size. The SDM entry for BT mentions that the instruction may touch 2 or 4 bytes depending on the operand size, but doesn't specifically mention that a 64 bit operation size touches 8 bytes - and it doesn't mention anything at all about operand size and access size in BTR/BTS/BTC (unless it's implied as part of the discussion about encoding the MSBs of a constant bit offset in the offset of the addressing mode). Is that an oversight? > The > access size generally is meaningless from a semantic standpoint > (little-endian being the only sane model), but the access size *can* > have performance implications for the write queue forwarding. It looks like that if the base address isn't aligned then neither is the generated access, so you could get a protection fault if it overlaps a page boundary, which is a semantic rather than purely operational difference. J -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Sun Jul 14 11:26:10 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Sun, 14 Jul 2013 23:56:10 +0530 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: Linus Torvalds wrote: >> btrl $1, 0 >> btr $1, 0 >> btsl $1, 0 >> bts $1, 0 > > What the heck is that supposed to show? I was trying to show a reduced case where gas doesn't complain, but llvm-mc does. Try compiling this with llvm-mc, and you'll get: .text btrl $1, 0 in.s:2:1: error: ambiguous instructions require an explicit suffix (could be 'btrw', 'btrl', or 'btrq') btr $1, 0 ^ btsl $1, 0 in.s:4:1: error: ambiguous instructions require an explicit suffix (could be 'btsw', 'btsl', or 'btsq') bts $1, 0 ^ Obviously, I misunderstood something major and screwed up the commit message. > int main(int argc, char **argv) > { > asm("bt %1,%0":"=m" (**argv): "a" (argc)); > asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc))); > } Right, so in: int main(int argc, char **argv) { asm("bts %1,%0":"=m" (**argv): "r" (argc)); asm("btsl %1,%0":"=m" (**argv): "r" (argc)); asm("btr %1,%0":"=m" (**argv): "r" ((unsigned long)(argc))); asm("btrq %1,%0":"=m" (**argv): "r" ((unsigned long)(argc))); } bts disambiguates to btsl, and btr disambiguates to btrq, as advertised. Is it dependent on whether I have a 32-bit machine or 64-bit machine, or just on the operand lengths? Either way, this is not a very enlightening example, because clang also compiles this fine, and doesn't complain about any ambiguity. To see the ambiguity I'm talking about, try to compile linux.git with clang; I'll paste one error: arch/x86/include/asm/bitops.h:129:15: error: ambiguous instructions require an explicit suffix (could be 'btrw', 'btrl', or 'btrq') asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); ^ :1:2: note: instantiated into assembly here btr $0,(%rsi) ^ Since nr is an int, and ADDR is *(volatile long *), this should disambiguate to btrl, right? Any clue why clang is complaining? From torvalds at linux-foundation.org Sun Jul 14 11:34:17 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sun, 14 Jul 2013 11:34:17 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: On Sun, Jul 14, 2013 at 11:26 AM, Ramkumar Ramachandra wrote: > > I was trying to show a reduced case where gas doesn't complain, but > llvm-mc does. Try compiling this with llvm-mc, and you'll get: Ok. So your commit message and explanation was pure and utter tripe, and the real reason you want this is that llvm-mc is broken. Please fix llvm-mc instead, ok? If the intent of llvm is to be compatible with the gnu compiler tools, then it should do that. Plus the gas behavior is clearly superior, so why not just improve the llvm toolchain to match those improved semantics? Linus From artagnon at gmail.com Sun Jul 14 11:49:05 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Mon, 15 Jul 2013 00:19:05 +0530 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: Linus Torvalds wrote: > Ok. So your commit message and explanation was pure and utter tripe, > and the real reason you want this is that llvm-mc is broken. > > Please fix llvm-mc instead, ok? If the intent of llvm is to be > compatible with the gnu compiler tools, then it should do that. Plus > the gas behavior is clearly superior, so why not just improve the llvm > toolchain to match those improved semantics? Yep. I started the discussion on LLVMDev, and posted patches [1]. >From the discussions on the list, many of the devs are claiming that LLVM is "correct" and that linux.git needs to be patched. I'm not taking sides; I just want a solution to the problem. [1]: The archive is broken, but here are some pieces: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130708/180968.html From torvalds at linux-foundation.org Sun Jul 14 12:09:21 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sun, 14 Jul 2013 12:09:21 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: On Sun, Jul 14, 2013 at 11:35 AM, Tim Northover wrote: > > I'm coming at this from the compiler side, where the register form is > unambiguous and not questioned. The discussion we're having involves > only the immediate form of the instruction. GNU as interprets: > > bt $63, mem > > as > btl $63, mem > > which may or may not be what the user intended, but is not the same as > "btq $63, mem". Umm. The user doesn't care. The user wants the best code without having to worry about it. Think of it this way: the whole and ONLY point of an assembler is to make machine code reasonably easy to write, by not having to worry about the exact encoding details. We don't want the users specifying the hex representation of the instructions, do we? Or even details like "what is the most efficient form of this instruction". For example, think about branch offsets and immediates. Most architectures have some limits about how long branch offsets or immediates are, and a short branch offset may use TOTALLY DIFFERENT instruction encoding than a long branch offset. Do you really expect that the user says "jnel" for the long form of the "jne" instruction? And "jnes" if you want the smaller/faster/simpler 8-bit version? No sane person actually wants that, and no modern assembler does that (although I can remember ones that did - ugh). You write "jne target" and depend on the assembler doing the right thing. Or you write "add $5,%eax", and depend on the fact that the assembler will use the much shorter version of the "add" instruction that just takes a 8-bit signed value instead of the full 32-bit immediate. Or any number of details like this ("there are special versions that only work on %eax" etc rules) And that is why I think you should just consider "bt $x,y" to be trivially the same thing and not at all ambiguous. Because there is ABSOLUTELY ZERO ambiguity when people write bt $63, mem Zero. Nada. None. The semantics are *exactly* the same for btl and btq in this case, so why would you want the user to specify one or the other? The user may be knowledgeable about the architecture, and know that "btl" is one byte shorter than "btq", and use "btl" for that reason. You seem to argue that that is the "right thing"(tm) to do, since that's what the instruction encoding will be. But if that's the case, then you are arguing that "jne target" is "ambiguous" because there are two different ways to encode that too? Do you seriously argue that? So I'm arguing that that is wrong for an assembler to not just do the right thing, because the user isn't *supposed* to have to know about things like "one byte shorter encoding format". And there really is no semantic difference between the two forms. So making the user specify the size is just going to cause problems (in particular, it might well make users go "this is an array of 64-bit entities, so I should use btq", even though that is actually incorrect). Now, I obviously think that the user should have the choice to *override* the default thing, so sometimes you might have /* We use a 64-bit btsq to encourage the CPU to do it as a 64-bit read-modify-write, since we will do a 64-bit read of the result later, and otherwise we'll get a partial write buffer stall */ btsq $63, mem and then the assembler had obviously better use the size information the user gave it. But the thing is, this is basically never a concern in practice, and when it is, the assembler really cannot know (it could go either way: maybe the bts is following a 32-bit write, and you want the 32-bit version - and I suspect that the likelihood of most users getting this right by hand is quite low too). (Side note: I'm not even going to guarantee that the actual CPU uses the operand size for the memory access size. The manuals imply they do, but since there are no real semantic reasons to enforce that, I could imagine that some microarchitecture doesn't actually care). Linus From jeremy at goop.org Sun Jul 14 12:23:37 2013 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Sun, 14 Jul 2013 12:23:37 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <1373806562-30422-1-git-send-email-artagnon@gmail.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E2FAB9.9050900@goop.org> (resent without HTML) On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote: > 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing, and the AT&T syntax manual > [1] contains no references to any such inference. As evidenced by the > following experiment, gas always disambiguates btr/bts to btrl/btsl. > Feed the following input to gas: > > btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0 When I originally did those patches, I was careful make sure that we didn't give implied sizes to operations with only immediate and/or memory operands because - in general - gas can't infer the operation size from such operands. However, in the case of the bit test/set operations, the memory access size is not really derived from the operation size (the SDM is a bit vague), and even if it were it would be an operation rather than semantic difference. So there's no real problem with gas choosing 'l' as a default size in the absence of any explicit override or constraint. > Check that btr matches btrl, and bts matches btsl in both cases: > > $ as --32 -a in.s > $ as --64 -a in.s > > To avoid giving readers the illusion of such an inference, and for > clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to > disambiguate btr/bts automatically. That sounds reasonable for all other operations because it makes a real semantic difference, but overly strict for bit operations. J > [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > Cc: Jeremy Fitzhardinge > Cc: Andi Kleen > Cc: Linus Torvalds > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Eli Friedman > Cc: Jim Grosbach > Cc: Stephen Checkoway > Cc: LLVMdev > Signed-off-by: Ramkumar Ramachandra > --- > We discussed this pretty extensively on LLVMDev, but I'm still not > sure that I haven't missed something. > > arch/x86/include/asm/bitops.h | 16 ++++++++-------- > arch/x86/include/asm/percpu.h | 2 +- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index 6dfd019..6ed3d1e 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > : "iq" ((u8)CONST_MASK(nr)) > : "memory"); > } else { > - asm volatile(LOCK_PREFIX "bts %1,%0" > + asm volatile(LOCK_PREFIX "btsl %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > } > } > @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > */ > static inline void __set_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); > + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); > } > > /** > @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) > : CONST_MASK_ADDR(nr, addr) > : "iq" ((u8)~CONST_MASK(nr))); > } else { > - asm volatile(LOCK_PREFIX "btr %1,%0" > + asm volatile(LOCK_PREFIX "btrl %1,%0" > : BITOP_ADDR(addr) > : "Ir" (nr)); > } > @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) > > static inline void __clear_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); > + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); > } > > /* > @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" > "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > return oldbit; > @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm("bts %2,%1\n\t" > + asm("btsl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile("btr %2,%1\n\t" > + asm volatile("btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index 0da5200..fda54c9 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -490,7 +490,7 @@ do { \ > #define x86_test_and_clear_bit_percpu(bit, var) \ > ({ \ > int old__; \ > - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > : "=r" (old__), "+m" (var) \ > : "dIr" (bit)); \ > old__; \ From jeremy at goop.org Sun Jul 14 12:23:54 2013 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Sun, 14 Jul 2013 12:23:54 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E2FACA.3050801@goop.org> (Resent without HTML) On 07/14/2013 10:19 AM, Linus Torvalds wrote: > Now, there are possible cases where you want to make the size explicit > because you are mixing memory operand sizes and there can be nasty > performance implications of doing a 32-bit write and then doing a > 64-bit read of the result. I'm not actually aware of us having ever > worried/cared about it, but it's a possible source of trouble: mixing > bitop instructions with non-bitop instructions can have some subtle > interactions, and you need to be careful, since the size of the > operand affects both the offset *and* the memory access size. The SDM entry for BT mentions that the instruction may touch 2 or 4 bytes depending on the operand size, but doesn't specifically mention that a 64 bit operation size touches 8 bytes - and it doesn't mention anything at all about operand size and access size in BTR/BTS/BTC (unless it's implied as part of the discussion about encoding the MSBs of a constant bit offset in the offset of the addressing mode). Is that an oversight? > The > access size generally is meaningless from a semantic standpoint > (little-endian being the only sane model), but the access size *can* > have performance implications for the write queue forwarding. It looks like that if the base address isn't aligned then neither is the generated access, so you could get a protection fault if it overlaps a page boundary, which is a semantic rather than purely operational difference. J From torvalds at linux-foundation.org Sun Jul 14 12:29:04 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sun, 14 Jul 2013 12:29:04 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E2FACA.3050801@goop.org> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FACA.3050801@goop.org> Message-ID: On Sun, Jul 14, 2013 at 12:23 PM, Jeremy Fitzhardinge wrote: > > It looks like that if the base address isn't aligned then neither is the > generated access, so you could get a protection fault if it overlaps a > page boundary, which is a semantic rather than purely operational > difference. You could also get AC fault for the btq if the thing is only long-aligned. But yes, I checked the Intel manuals too, and the access size is actually not well-specified (even the 16-bit case says "may", I think), so both the page-fault and the alignment fault are purely theoretical. And i'm too lazy to bother trying the (easily testable) alignment fault case in practice, since (a) nobody cares and (b) nobody cares. In the (unlikely) situation that somebody actually cares, that somebody should obviously then have to specify "btl" vs "btq". Assuming the hardware cares, which is testable but might be micro-architecture dependent. Linus From torvalds at linux-foundation.org Sun Jul 14 12:49:44 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sun, 14 Jul 2013 12:49:44 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: On Sun, Jul 14, 2013 at 12:30 PM, Tim Northover wrote: > > I don't think you've actually tested that, have you? (x86-64) Oh, you're right, for constants > 5 bits you have that other thing going on. I didn't think about the fact that the constant changed in the middle of the thread (it started out as 1). We use the gcc constraint "I" (0-31) in the kernel for this reason. Linus From ck at remobjects.com Sun Jul 14 13:19:09 2013 From: ck at remobjects.com (Carlo Kok) Date: Sun, 14 Jul 2013 22:19:09 +0200 Subject: [LLVMdev] Windows reviewers still needed? In-Reply-To: <87bo65f8t2.fsf@wanadoo.es> References: <87bo65f8t2.fsf@wanadoo.es> Message-ID: <51E307BD.5050301@remobjects.com> Op 14-7-2013 17:41, Óscar Fuentes schreef: > Richard writes: > >> I watched some LLVM videos yesterday and there was one where Chandler >> Carruth was saying that LLVM/clang needed more Windows savvy >> developers to review patches. >> >> Is this still the case? If so, how do I contribute? > > Just to be more verbose that Anton... :-) > > Subscribe to llvm-commits and cfe-commits mailing lists, watch for > patches touching Windows functionality and reply to them with your > comments. > There's also windows stuff going on in lldb-commits -- Carlo Kok From tobias at grosser.es Sun Jul 14 14:59:26 2013 From: tobias at grosser.es (Tobias Grosser) Date: Sun, 14 Jul 2013 14:59:26 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> Message-ID: <51E31F3E.3080501@grosser.es> On 07/14/2013 08:05 AM, Star Tan wrote: > I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. > Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. Very nice analysis. I just tried it myself and can verify that for oggenc 16x, your patch reduces the scop-detection time from 90 seconds (80 %) to 0.5 seconds (2.5 %). I think there are two problems: 1) The cost of printing a single LLVM type/value increases with the size of the overall Module. This seems to be because TypeFinder::run() is called each time, without caching in place. The cost of TypeFinder::run() increases with the size of the module, as it basically just performs a scan on the entire Module. 2) We are formatting the failure messages during normal compilation, even though they are only used in debugging tools like -view-scops In terms of solutions: It would be interesting to understand why is 1) so slow, especially as it seems to be either a fundamental problem in LLVM IR printing or the way we use the IR printing infrastructure. On the other side, for Polly we need to solve 2) anyway. Even if formatting would be faster, we should still not do it, if not needed. As we need to solve 2) anyway, 1) will only hit us when we do debugging/formatting. I assume in case of debugging the files we are looking into are normally smaller, such that the formatting overhead will not be that big. Hence, I would focus on 2). We could probably just put the code under a NDEBUG ifndef, but I would actually like to keep them available even in NDEBUG mode, as we may want to use the error messages to hint users to why their code can not be optimized. For this and also to get rid of another annoyance, the INVALID macro, I think we need to restructure the reporting of the last error, such that formatting of the error messages can be done on-demand. Another problem that could be solved at this point is to remove the macro use, which hides the fact that the functions return as soon as INVALID is called, which is plainly ugly. I am not sure how to structure this, but I could imagine some small class hierarchy that has a class for each error type. Each class just stores pointers to the data structures it needs to format its error message, but only formats the error on-demand. We could then return this class in case of failure and return a NoError class or a NULL pointer in case of success. This change may also help us to later add support to keep track of all errors we encounter (not just the first one). This is something Andreas and Johannes found helpful earlier. Cheers, Tobias g From b0ef at esben-stien.name Sun Jul 14 18:12:28 2013 From: b0ef at esben-stien.name (Esben Stien) Date: Mon, 15 Jul 2013 03:12:28 +0200 Subject: [LLVMdev] libcompiler_rt.a, No such file or directory Message-ID: <87txjwpqwz.fsf@quasar.esben-stien.name> Trying to compile llvm-3.3 and I get this: llvm[4]: Copying runtime library linux/asan-i386 to build dir cp: cannot stat «/pkg/llvm-3.3.src/tools/clang/runtime/compiler-rt/clang_linux/full-i386/libcompiler_rt.a»: Ingen slik fil eller filkatalog llvm[4]: Copying runtime library linux/ubsan-i386 to build dir llvm[4]: Copying runtime library linux/ubsan_cxx-i386 to build dir make[4]: *** [/pkg/llvm-3.3.src/Release/lib/clang/3.3/lib/linux/libclang_rt.full-i386.a] Error 1 make[4]: *** Waiting for unfinished jobs.... cp: cannot stat «/pkg/llvm-3.3.src/tools/clang/runtime/compiler-rt/clang_linux/profile-i386/libcompiler_rt.a»: Ingen slik fil eller filkatalog cp: cannot stat «/pkg/llvm-3.3.src/tools/clang/runtime/compiler-rt/clang_linux/asan-i386/libcompiler_rt.a»cp: cannot stat «/pkg/llvm-3.3.src/tools/clang/runtime/compiler-rt/clang_linux/san-i386/libcompiler_rt.a»: Ingen slik fil eller filkatalog : Ingen slik fil eller filkatalog I'm using gcc-4.8.1 and glibc-2.17 and I'm on 32 bit GNU/Linux. Any idea as to what I can try?. -- Esben Stien is b0ef at e s a http://www. s t n m irc://irc. b - i . e/%23contact sip:b0ef@ e e jid:b0ef@ n n From atrick at apple.com Sun Jul 14 17:56:03 2013 From: atrick at apple.com (Andrew Trick) Date: Sun, 14 Jul 2013 17:56:03 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E087E2.5040101@gmail.com> References: <51E087E2.5040101@gmail.com> Message-ID: <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > 3.2 Compile partitions independently > -------------------------------------- > > There are two camps: one camp advocate compiling partitions via multi-process, > the other one favor multi-thread. > > Inside Apple compiler teams, I'm the only one belong to the 1st comp. I think > while multi-proc sounds bit red-neck, it has its advantage for this purpose, and > while multi-thread is certainly more eye-popping, it has its advantage > as well. > > The advantage of multi-proc are: > 1) easier to implement, the process run in its own address space. > We don't need to worry about they can interfere with each other. > > 2)huge, or not unlimited, address space. > > The disadvantage is that it's expensive. But I guess the cost is > almost negligible compared to the overall IPO compilation. > > The advantage of multi-threads I can imagine are: > 1) sound fancy > 2) it is light-weight > 3) inter-thread communication is easier than IPC. > > Its disadvantage are: > 1). Oftentime we will come across race-condition, and it took > awful long time to figure it out. While the code is supposed > to be mult-thread safe, we might miss some tricky case. > Trouble-shooting race condition is a nightmare. > > 2) Small address space. This is big problem if we the compiler > is built 32-bit . In that case, the compiler is not able to bring > lots of stuff in memory even if the HW dose > provide ample mem. > > 3) The thread-safe run-time lib is more expensive. > I once linked a compiler using -lpthread (I dose not have to) on a > UNIX platform, and saw the compiler slow down by about 1/3. > > I'm not able to convince the folks in other camp, neither are they > able to convince me. I decide to implement both. Fortunately, this > part is not difficult, it seems to be rather easy to crank out one within short > period of time. It would be interesting to compare them side-by-side, > and see which camp lose:-). On the other hand, if we run into race-condition > problem, we choose multi-proc version as a fall-back. While I am a self-proclaimed multi-process red-neck, in this case I would prefer to see a multi-threaded implementation because I want to verify that LLVMContext can be used as advertised. I'm sure some extra care will be needed to report failures/diagnostics, but we should start with the assumption that this approach is not significantly harder than multi-process because that's how we advertise the design. If any of the multi-threaded disadvantages you point out are real, I would like to find out about it. 1. Race Conditions: We should be able to verify that the thread-parallel vs. sequential or multi-process compilation generate the same result. If they diverge, we would like to know about the bug so it can be fixed--independent of LTO. 2. Small Address Space with LTO. We don't need to design around this hypothetical case. 3. Expensive thread-safe runtime lib. We should not speculate that platforms that we, as the LLVM community, care about have this problem. Let's assume that our platforms are well implemented unless we have data to the contrary. (Personally, I would even love to use TLS in the compiler to vastly simplify API design in the backend, but I am not going to be popular for saying so). We should be able to decompose each step of compilation for debugging. So the multi-process "implementation" should just be a degenerate form of threading with a bit of driver magic if you want to automate it. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Sun Jul 14 17:57:29 2013 From: atrick at apple.com (Andrew Trick) Date: Sun, 14 Jul 2013 17:57:29 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E087E2.5040101@gmail.com> References: <51E087E2.5040101@gmail.com> Message-ID: <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > 6) Miscellaneous > =========== > Will partitioning degrade performance in theory. I think it depends on the definition of > performance. If performance means execution-time, I guess it dose not. > However, if performance includes code-size, I think it may have some negative impact. > Following is few scenario: > > - constants generated by the post-IPO passes are not shared across partitions > - dead func may be detected during the post-IPO stage, and they may not be deleted. In don't know if it's feasible, but stable linker output, independent of the partioning, is highly desirable. One of the most irritating performance regressions to track down involves different versions of the host linker. If partitioning decisions are thrown into the mix, this could be annoying. Is it possible for the final link to do a better job cleaning up? -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Sun Jul 14 18:38:50 2013 From: echristo at gmail.com (Eric Christopher) Date: Sun, 14 Jul 2013 18:38:50 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> Message-ID: On Sun, Jul 14, 2013 at 5:57 PM, Andrew Trick wrote: > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > > 6) Miscellaneous > =========== > Will partitioning degrade performance in theory. I think it depends on > the definition of > performance. If performance means execution-time, I guess it dose not. > However, if performance includes code-size, I think it may have some > negative impact. > Following is few scenario: > > - constants generated by the post-IPO passes are not shared across > partitions > - dead func may be detected during the post-IPO stage, and they may not be > deleted. > > > In don't know if it's feasible, but stable linker output, independent of the > partioning, is highly desirable. One of the most irritating performance > regressions to track down involves different versions of the host linker. If > partitioning decisions are thrown into the mix, this could be annoying. Is > it possible for the final link to do a better job cleaning up? While I haven't yet read the rest of the proposal I'm going to comment on this in particular. In my view this is an absolute requirement as the compiler should produce the same output given the same input every time with no deviation. -eric From atrick at apple.com Sun Jul 14 19:07:21 2013 From: atrick at apple.com (Andrew Trick) Date: Sun, 14 Jul 2013 19:07:21 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> Message-ID: <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> On Jul 14, 2013, at 6:38 PM, Eric Christopher wrote: > On Sun, Jul 14, 2013 at 5:57 PM, Andrew Trick wrote: >> >> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: >> >> 6) Miscellaneous >> =========== >> Will partitioning degrade performance in theory. I think it depends on >> the definition of >> performance. If performance means execution-time, I guess it dose not. >> However, if performance includes code-size, I think it may have some >> negative impact. >> Following is few scenario: >> >> - constants generated by the post-IPO passes are not shared across >> partitions >> - dead func may be detected during the post-IPO stage, and they may not be >> deleted. >> >> >> In don't know if it's feasible, but stable linker output, independent of the >> partioning, is highly desirable. One of the most irritating performance >> regressions to track down involves different versions of the host linker. If >> partitioning decisions are thrown into the mix, this could be annoying. Is >> it possible for the final link to do a better job cleaning up? > > While I haven't yet read the rest of the proposal I'm going to comment > on this in particular. In my view this is an absolute requirement as > the compiler should produce the same output given the same input every > time with no deviation. The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality. Sorry for the tangential thought here... it seems that most of Shuxin’s proposal is actually independent of LTO, even though the prototype and primary goal is enabling LTO. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From dberlin at dberlin.org Sun Jul 14 20:15:22 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Sun, 14 Jul 2013 20:15:22 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51E31F3E.3080501@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> Message-ID: On Sun, Jul 14, 2013 at 2:59 PM, Tobias Grosser wrote: > On 07/14/2013 08:05 AM, Star Tan wrote: >> >> I have found that the extremely expensive compile-time overhead comes from >> the string buffer operation for "INVALID" MACRO in the polly-detect pass. >> Attached is a hack patch file that simply remove the string buffer >> operation. This patch file can significantly reduce compile-time overhead >> when compiling big source code. For example, for oggen*8.ll, the compile >> time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch >> file. > > > Very nice analysis. I just tried it myself and can verify that for > oggenc 16x, your patch reduces the scop-detection time from 90 seconds > (80 %) to 0.5 seconds (2.5 %). > > I think there are two problems: > > 1) The cost of printing a single LLVM type/value increases with > the size of the overall Module. This seems to be because > TypeFinder::run() is called each time, without caching in place. > The cost of TypeFinder::run() increases with the size of the > module, as it basically just performs a scan on the entire Module. > > 2) We are formatting the failure messages during normal compilation, > even though they are only used in debugging tools like -view-scops > > In terms of solutions: > > It would be interesting to understand why is 1) so slow, especially as > it seems to be either a fundamental problem in LLVM IR printing or the > way we use the IR printing infrastructure. I once analyzed it for GVN when I hit a similar issue (type finder being rerun again and again while printing statements), and I came up with the answer that the printing infrastructure does not expect to be printing things regularly, and the typefinder only gets called so much during debug printing, because of the call paths it goes through. The basic cause is that Value::print creates AssemblyWriter objects every time it is called. This in turn calls the type finder, for every single call to Value::print. The type finder is being called to build up the list of named types in the module, so that when it prints things, it can use the right name for the type instead of an anonymous name. if you do the following: diff --git a/lib/VMCore/AsmWriter.cpp b/lib/VMCore/AsmWriter.cpp index 7ef1131..8a2206b 100644 --- a/lib/VMCore/AsmWriter.cpp +++ b/lib/VMCore/AsmWriter.cpp @@ -2118,7 +2118,7 @@ void Value::print(raw_ostream &ROS, AssemblyAnnotationWrit if (const Instruction *I = dyn_cast(this)) { const Function *F = I->getParent() ? I->getParent()->getParent() : 0; SlotTracker SlotTable(F); - AssemblyWriter W(OS, SlotTable, getModuleFromVal(I), AAW); + AssemblyWriter W(OS, SlotTable, NULL, AAW); W.printInstruction(*I); } else if (const BasicBlock *BB = dyn_cast(this)) { SlotTracker SlotTable(BB->getParent()); (This is an old patch, AsmWriter.cpp is not in lib/VMCore anymore, but you can see what is happening), you will not be shown named types in the operand printing, but it will not be slow anymore. This is of course, a complete hack just to work around the issue if you need to get other work done. The real fix is either to stop recreating these AssemblyWriter objects, or improve caching in the bowels of it so that it doesn't need to rerun typefinder again and again if nothing has changed. > On the other side, for Polly > we need to solve 2) anyway. Even if formatting would be faster, we > should still not do it, if not needed. As we need to solve 2) anyway, 1) > will only hit us when we do debugging/formatting. I assume in case of > debugging the files we are looking into are normally smaller, such that > the formatting overhead will not be that big. > > Hence, I would focus on 2). We could probably just put the code under a > NDEBUG ifndef, but I would actually like to keep them available even in > NDEBUG mode, as we may want to use the error messages to hint users to > why their code can not be optimized. For this and also to get rid of > another annoyance, the INVALID macro, I think we need to restructure the > reporting of the last error, such that formatting of the error messages > can be done on-demand. Another problem that could be solved at this > point is to remove the macro use, which hides the fact that the > functions return as soon as INVALID is called, which is plainly ugly. > > I am not sure how to structure this, but I could imagine some small > class hierarchy that has a class for each error type. Each class just > stores pointers to the data structures it needs to format its error > message, but only formats the error on-demand. We could then return this > class in case of failure and return a NoError class or a NULL pointer in > case of success. > > This change may also help us to later add support to keep track of all > errors we encounter (not just the first one). This is something Andreas > and Johannes found helpful earlier. > > Cheers, > Tobias > > > > > g > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ak at linux.jf.intel.com Sun Jul 14 14:14:46 2013 From: ak at linux.jf.intel.com (Andi Kleen) Date: Sun, 14 Jul 2013 14:14:46 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E2FAB9.9050900@goop.org> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FAB9.9050900@goop.org> Message-ID: <20130714211446.GP5643@tassilo.jf.intel.com> I think best would be to just find some way to implement LOCK prefix patching using atomic compiler intrinsics and then switch to those Then all this inline assembler horror could be ifdef'ed away for old compilers only, and likely the generated code would be better as the compiler could optimize more. Or just give up on LOCK patching, as single CPU systems and VMs are less and less interesting? -Andi -- ak at linux.intel.com -- Speaking for myself only From clattner at apple.com Sun Jul 14 21:52:47 2013 From: clattner at apple.com (Chris Lattner) Date: Sun, 14 Jul 2013 21:52:47 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: Message-ID: <96EBFCFE-8AEA-428D-A72D-2DDD0335CB95@apple.com> On Jul 13, 2013, at 11:30 PM, Nadav Rotem wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements (below) I would like to enable the SLP-vectorizer by default on -O3. I would like to hear what others in the community think about this and give other people the opportunity to perform their own performance measurements. This looks great Nadav. The performance wins are really big. How you investigated the bh and bullet regression though? We should at least understand what is going wrong there. bh is pretty tiny, so it should be straight-forward. It would also be really useful to see what the code size and compile time impact is. -Chris > > — Performance Gains — > SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% > MultiSource/Benchmarks/Olden/power/power -18.55% > MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.71% > SingleSource/Benchmarks/Misc/flops-6 -11.02% > SingleSource/Benchmarks/Misc/flops-5 -10.03% > MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.37% > External/Nurbs/nurbs -7.98% > SingleSource/Benchmarks/Misc/pi -7.29% > External/SPEC/CINT2000/252_eon/252_eon -5.78% > External/SPEC/CFP2006/444_namd/444_namd -4.52% > External/SPEC/CFP2000/188_ammp/188_ammp -4.45% > MultiSource/Applications/SIBsim4/SIBsim4 -3.58% > MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% > SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% > MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl -2.75% > MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% > MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.95% > SingleSource/Benchmarks/Misc/flops -1.89% > SingleSource/Benchmarks/Misc/oourafft -1.71% > MultiSource/Benchmarks/mafft/pairlocalalign -1.16% > External/SPEC/CFP2006/447_dealII/447_dealII -1.06% > > — Regressions — > MultiSource/Benchmarks/Olden/bh/bh 22.47% > MultiSource/Benchmarks/Bullet/bullet 7.31% > SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% > SingleSource/Benchmarks/SmallPT/smallpt 3.91% > > Thanks, > Nadav > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nrotem at apple.com Sun Jul 14 22:55:42 2013 From: nrotem at apple.com (Nadav Rotem) Date: Sun, 14 Jul 2013 22:55:42 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: <96EBFCFE-8AEA-428D-A72D-2DDD0335CB95@apple.com> References: <96EBFCFE-8AEA-428D-A72D-2DDD0335CB95@apple.com> Message-ID: On Jul 14, 2013, at 9:52 PM, Chris Lattner wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements (below) I would like to enable the SLP-vectorizer by default on -O3. I would like to hear what others in the community think about this and give other people the opportunity to perform their own performance measurements. > > This looks great Nadav. The performance wins are really big. How you investigated the bh and bullet regression though? Thanks. Yes, I looked at both. The hot function in BH is “gravsub”. The vectorized IR looks fine and the assembly looks fine, but for some reason Instruments reports that the first vector-subtract instruction takes 18% of the time. The regression happens both with the VEX prefix and without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet. +0x00 movupd 16(%rsi), %xmm0 +0x05 movupd 16(%rsp), %xmm1 +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? +0x0f movapd %xmm0, %xmm2 +0x13 mulsd %xmm2, %xmm2 +0x17 xorpd %xmm1, %xmm1 +0x1b addsd %xmm2, %xmm1 I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfortunately triples of RGB and XYZ are very popular in some domains and we do want to vectorize them. I skimmed through the IR and the assembly and I did not see anything too bad. The next step would be to do a binary search on the places where the vectorizer fires to locate the bad pattern. On AVX we have another regression that I did not mention: Flops-7. When we vectorize we cause more spills because we do a poor job scheduling non-destructive source instructions (related to PR10928). Hopefully Andy’s scheduler will fix this regression once it is enabled. I did not measure code size, but I did measure compile time. There are 4-5 workloads (not counting workloads that run below 0.5 seconds) where the compile time increase is more than 5%. I am aware of a problem in the (quadratic) code that looks for consecutive stores. This code calls SCEV too many times. I plan to fix this. Thanks, Nadav > We should at least understand what is going wrong there. bh is pretty tiny, so it should be straight-forward. It would also be really useful to see what the code size and compile time impact is. > > -Chris > >> >> — Performance Gains — >> SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% >> MultiSource/Benchmarks/Olden/power/power -18.55% >> MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.71% >> SingleSource/Benchmarks/Misc/flops-6 -11.02% >> SingleSource/Benchmarks/Misc/flops-5 -10.03% >> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.37% >> External/Nurbs/nurbs -7.98% >> SingleSource/Benchmarks/Misc/pi -7.29% >> External/SPEC/CINT2000/252_eon/252_eon -5.78% >> External/SPEC/CFP2006/444_namd/444_namd -4.52% >> External/SPEC/CFP2000/188_ammp/188_ammp -4.45% >> MultiSource/Applications/SIBsim4/SIBsim4 -3.58% >> MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% >> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% >> MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl -2.75% >> MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% >> MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.95% >> SingleSource/Benchmarks/Misc/flops -1.89% >> SingleSource/Benchmarks/Misc/oourafft -1.71% >> MultiSource/Benchmarks/mafft/pairlocalalign -1.16% >> External/SPEC/CFP2006/447_dealII/447_dealII -1.06% >> >> — Regressions — >> MultiSource/Benchmarks/Olden/bh/bh 22.47% >> MultiSource/Benchmarks/Bullet/bullet 7.31% >> SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% >> SingleSource/Benchmarks/SmallPT/smallpt 3.91% >> >> Thanks, >> Nadav >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From godepankaj at yahoo.com Sun Jul 14 23:28:46 2013 From: godepankaj at yahoo.com (Pankaj Gode) Date: Sun, 14 Jul 2013 23:28:46 -0700 (PDT) Subject: [LLVMdev] Regarding scope information for variable declaration. In-Reply-To: <1373598022502-59268.post@n5.nabble.com> References: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> <1373598022502-59268.post@n5.nabble.com> Message-ID: <1373869726.92026.YahooMailNeo@web165003.mail.bf1.yahoo.com> Hi Eric,  I was considering machine instructions to get scope information. And variable declaration does not correspond to machine instruction, hence the problem i.e. no scope associated with it. If 'i' is initialized in the 'if-scope' then we get 'variable i' mapped to correct scope as corresponding machine instruction is generated for this. This is not a problem as we can't expect variable declaration in a machine instruction, I thought. S const Function *F1 =  MF->getFunction(); for(Function::const_iterator BB = F1->begin(), E = F1->end();                                 BB != E; ++BB)   {      for(BasicBlock::const_iterator ii = BB->begin(), ie = BB->end();         ii != ie; ++ii)      {         const Instruction *I = ii;  //I->dump();//debug         DebugLoc MIDB = I->getDebugLoc();       }    } Though this is an overhead as scope information exists, but I need to collect specific information such as 'start line, end line, start column, end column' (End line information should be derived as is not obvious). Collecting information this way allowed me to get correct scope information, and hence I was able to map the variable declaration to the scope. It worked for me this way.   Regards, Pankaj   ________________________________ From: eric.lew To: llvmdev at cs.uiuc.edu Sent: Friday, July 12, 2013 8:30 AM Subject: Re: [LLVMdev] Regarding scope information for variable declaration. I have the same demand. Have you resolved this problems? if so, would you share me the solution? Best Regards. Eric -- View this message in context: http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59268.html Sent from the LLVM - Dev mailing list archive at Nabble.com. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu/ http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev o instead of using machine instructions to collect scope information, (as used by LexicalScope pass), I had written code to collect scope information based on LLVM Instructions. I did this by iterating over 'Function->BasicBlock' instead of 'MachineFunction->MachineBasicBlock'. -------------- next part -------------- An HTML attachment was scrubbed... URL: From godepankaj at yahoo.com Sun Jul 14 23:32:41 2013 From: godepankaj at yahoo.com (Pankaj Gode) Date: Sun, 14 Jul 2013 23:32:41 -0700 (PDT) Subject: [LLVMdev] Regarding scope information for variable declaration. In-Reply-To: <1373598022502-59268.post@n5.nabble.com> References: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> <1373598022502-59268.post@n5.nabble.com> Message-ID: <1373869961.27602.YahooMailNeo@web165006.mail.bf1.yahoo.com> Hi Eric,  I was considering machine instructions to get scope information. And variable declaration does not correspond to machine instruction, hence the problem i.e. no scope associated with it. If 'i' is initialized in the 'if-scope' then we get 'variable i' mapped to correct scope as corresponding machine instruction is generated for this. This is not a problem as we can't expect variable declaration in a machine instruction, I thought. S I had written code to collect scope information based on LLVM Instructions. I did this by iterating over 'Function->BasicBlock' instead of 'MachineFunction->MachineBasicBlock'. const Function *F1 =  MF->getFunction(); for(Function::const_iterator BB = F1->begin(), E = F1->end();                                 BB != E; ++BB)   {      for(BasicBlock::const_iterator ii = BB->begin(), ie = BB->end();         ii != ie; ++ii)      {         const Instruction *I = ii;  //I->dump();//debug         DebugLoc MIDB = I->getDebugLoc();       }    } Though this is an overhead as scope information exists, but I need to collect specific information such as 'start line, end line, start column, end column' (End line information should be derived as is not obvious). Collecting information this way allowed me to get correct scope information, and hence I was able to map the variable declaration to the scope. It worked for me this way.   Regards, Pankaj     ________________________________ From: eric.lew To: llvmdev at cs.uiuc.edu Sent: Friday, July 12, 2013 8:30 AM Subject: Re: [LLVMdev] Regarding scope information for variable declaration. I have the same demand. Have you resolved this problems? if so, would you share me the solution? Best Regards. Eric -- View this message in context: http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59268.html Sent from the LLVM - Dev mailing list archive at Nabble.com. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu/ http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev o instead of using machine instructions to collect scope information, (as used by LexicalScope pass), -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleg.maslov at intel.com Sun Jul 14 23:47:50 2013 From: oleg.maslov at intel.com (Maslov, Oleg) Date: Mon, 15 Jul 2013 06:47:50 +0000 Subject: [LLVMdev] How to interrupt PassManager optimizations flow? Message-ID: <757B88EACEC81F43A3217A03F99B697E2E6EEBF6@IRSMSX101.ger.corp.intel.com> Hi, I have PassManager object storing module optimizations. One of the optimizations performs analysis which basically checks the code for being "valid". If code is not "valid" we should break optimizations flow in PassManager and report error. Does PassManager support interrupting flow of optimizations if one of optimizations raise "stop" flag? Thanks, -Oleg -------------------------------------------------------------------- Closed Joint Stock Company Intel A/O Registered legal address: Krylatsky Hills Business Park, 17 Krylatskaya Str., Bldg 4, Moscow 121614, Russian Federation This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eirc.lew at gmail.com Mon Jul 15 02:40:56 2013 From: eirc.lew at gmail.com (Eric Lu) Date: Mon, 15 Jul 2013 17:40:56 +0800 Subject: [LLVMdev] Fwd: Regarding scope information for variable declaration. In-Reply-To: <1373869961.27602.YahooMailNeo@web165006.mail.bf1.yahoo.com> References: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> <1373598022502-59268.post@n5.nabble.com> <1373869961.27602.YahooMailNeo@web165006.mail.bf1.yahoo.com> Message-ID: Thank your reply. Pankaj. Actually, I have done it very similar to yours. But I think for my demand, it is better to implement in Front End. Maybe I will re-implement it later in clang. ---------- Forwarded message ---------- From: Pankaj Gode [via LLVM] Date: Mon, Jul 15, 2013 at 2:35 PM Subject: Re: Regarding scope information for variable declaration. To: "eric.lew" Hi Eric, I was considering machine instructions to get scope information. And variable declaration does not correspond to machine instruction, hence the problem i.e. no scope associated with it. If 'i' is initialized in the 'if-scope' then we get 'variable i' mapped to correct scope as corresponding machine instruction is generated for this. This is not a problem as we can't expect variable declaration in a machine instruction, I thought. S o instead of using machine instructions to collect scope information, (as used by LexicalScope pass), I had written code to collect scope information based on LLVM Instructions. I did this by iterating over 'Function->BasicBlock' instead of 'MachineFunction->MachineBasicBlock'. const Function *F1 = MF->getFunction(); for(Function::const_iterator BB = F1->begin(), E = F1->end(); BB != E; ++BB) { for(BasicBlock::const_iterator ii = BB->begin(), ie = BB->end(); ii != ie; ++ii) { const Instruction *I = ii; //I->dump();//debug DebugLoc MIDB = I->getDebugLoc(); } } Though this is an overhead as scope information exists, but I need to collect specific information such as 'start line, end line, start column, end column' (End line information should be derived as is not obvious). Collecting information this way allowed me to get correct scope information, and hence I was able to map the variable declaration to the scope. It worked for me this way. Regards, Pankaj *From:* eric.lew <[hidden email] > *To:* [hidden email] *Sent:* Friday, July 12, 2013 8:30 AM *Subject:* Re: [LLVMdev] Regarding scope information for variable declaration. I have the same demand. Have you resolved this problems? if so, would you share me the solution? Best Regards. Eric -- View this message in context: http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59268.html Sent from the LLVM - Dev mailing list archive at Nabble.com. _______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu/ http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev _______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev ------------------------------ If you reply to this email, your message will be added to the discussion below: http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59345.html To unsubscribe from Regarding scope information for variable declaration., click here . NAML -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Mon Jul 15 04:44:09 2013 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 15 Jul 2013 04:44:09 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: Message-ID: On Sun, Jul 14, 2013 at 12:07 AM, Chandler Carruth wrote: > I'll take it for a spin on our benchmarks. It'll be a bit before I can go in and reduce it, but I thought I would mention that I've seen just one new crasher, and it's on part of the GLU's reference implementation libtess in normal.c... No real details, but in case you're aware or someone else knows how to build that... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jobnoorman at gmail.com Mon Jul 15 05:45:47 2013 From: jobnoorman at gmail.com (Job Noorman) Date: Mon, 15 Jul 2013 14:45:47 +0200 Subject: [LLVMdev] Question about LLVM r184574 Message-ID: <2629216.xW9BMigDW3@squat> Hi Andrew, While working on the MSP430 backend, I noticed code that compiled fine before hitting an assert which you have recently inserted in r184574. More specifically, in InlineSpiller.cpp:1076 the following assert is triggered: > assert(MO->isDead() && "Cannot fold physreg def"); I wouldn't be surprised is the underlying cause is in the MSP430 backend but I wanted to consult you before diving into the code. Is this assert really necessary? If I change to code to simply "continue" on "!MO->isDead()", everything seems to be fine and all testcases still pass. Regards, Job From elena.demikhovsky at intel.com Mon Jul 15 05:51:31 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Mon, 15 Jul 2013 12:51:31 +0000 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions In-Reply-To: References: Message-ID: 1) Is there actually a 32-bit mode for MIC? 32-bit ELFs are not recognized, so... There is no 32-bit KNC. 2) MIC ISA is 32-bit ISA (no SSE/MMX) plus 256-bit AVX-like vectors? No, 256-bit vectors are not supported. KNC is scalar ISA (Knights Corner supports a subset of the Intel 64 Architecture instructions) + 512-bit vectors + masks 3) then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers? Please check what ICC does. X87 registers are supported. - Elena -----Original Message----- From: Dmitry Mikushin [mailto:dmitry at kernelgen.org] Sent: Friday, July 12, 2013 23:44 To: Demikhovsky, Elena Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions Hello Elena, Thanks for info! Since Knights Landing (KNL) is going to be shipped also in form of host CPU, it will have to have open-source support :) But given that KNL is only announced 1 month ago, we should expect up to 1.5 years for it to become somewhat wide-spread, i.e. 2014-2015. Meanwhile, I still hope to perform some KNC evaluation, so answers to above questions are much appreciated! Best, - D. 2013/7/12 Demikhovsky, Elena : > Hello Dmitry, > > I'm working on KNL backend and plan to push it to the open source once the ISA becomes public. We do not plan to support KNC architecture in open source. > > - Elena > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Dmitry Mikushin > Sent: Friday, July 12, 2013 01:51 > To: LLVM Developers Mailing List > Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and > questions > > Dear all, > > I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. > Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. > > Could you please help finding answers to the following questions: > > 1) Is there actually a 32-bit mode for MIC? 32-bit ELFs are not recognized, so... > 2) MIC ISA is 32-bit ISA (no SSE/MMX) plus 256-bit AVX-like vectors? > 3) If 1 is "no" and 2 is "yes", then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers (e.g. return double value) even in scalar programs? > > Thanks, > - D. > > === > > $ cat hello.c > #include > > int main() > { > printf("Hello, Intel MIC!\n"); > return 0; > } > > $ PATH=$PATH:~/rpmbuild/CHROOT/opt/kernelgen/usr/bin clang -emit-llvm -c hello.c -o - | PATH=$PATH:~/forge/llvm/install/bin/ opt -O3 -S -o hello.ll $ cat hello.ll ; ModuleID = '' > target datalayout = > "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" > target triple = "x86_64-unknown-linux-gnu" > > @str = private unnamed_addr constant [18 x i8] c"Hello, Intel MIC!\00" > > ; Function Attrs: nounwind uwtable > define i32 @main() #0 { > entry: > %puts = tail call i32 @puts(i8* getelementptr inbounds ([18 x i8]* @str, i64 0, i64 0)) > ret i32 0 > } > > ; Function Attrs: nounwind > declare i32 @puts(i8* nocapture) #1 > > attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" > "no-infs-fp-math"="false" "no-nans-fp-math"="false" > "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { > nounwind } > > $ PATH=$PATH:~/forge/llvm/install/bin/ llc hello.ll -march=k1om64 > -filetype=obj -o hello.mic.o $ objdump -d hello.mic.o > > hello.mic.o: file format elf64-k1om > > > Disassembly of section .text: > > 0000000000000000
: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: bf 00 00 00 00 mov $0x0,%edi > 9: e8 00 00 00 00 callq e > e: 31 c0 xor %eax,%eax > 10: 5d pop %rbp > 11: c3 retq > > $ icc -mmic hello.mic.o -o hello > x86_64-k1om-linux-ld: error in hello.mic.o(.eh_frame); no .eh_frame_hdr table will be created. > > $ /opt/intel/mic/bin/micnativeloadex ./hello Hello, Intel MIC! > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From dmitry at kernelgen.org Mon Jul 15 06:04:15 2013 From: dmitry at kernelgen.org (Dmitry Mikushin) Date: Mon, 15 Jul 2013 15:04:15 +0200 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions In-Reply-To: References: Message-ID: Hello Elena, > There is no 32-bit KNC. Are you sure about this? From "System V Application Binary Interface K1OM Architecture Processor Supplement Version 1.0", p. 124: | A.1 Execution of 32-bit Programs | | The K1OM processors are able to execute 64-bit K1OM and also 32-bit ia32 programs. I'm really really looking for this opportunity, because we want to extend our kernel code generation capabilities [1] with MIC support. > No, 256-bit vectors are not supported. KNC is scalar ISA (Knights Corner supports a subset of the Intel 64 Architecture instructions) + 512-bit vectors + masks Of course, 512-bit, that was my typo, sorry. > Please check what ICC does. X87 registers are supported. Checked. Unfortunately ICC does use zmm in scalar 64-bit programs, which requires new ABI in LLVM. - D. [1] http://www.old.inf.usi.ch/file/pub/75/tech_report2013.pdf From renato.golin at linaro.org Mon Jul 15 06:48:16 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 15 Jul 2013 14:48:16 +0100 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: <96EBFCFE-8AEA-428D-A72D-2DDD0335CB95@apple.com> Message-ID: Hi Nadav, I think it's a great idea to have the slp vectorizer enabled, but maybe we should trim the horrible cases first (regressions, +5% compile time, etc). I don't mind sub-5% compile-time increase in O3, nor I mind sub-1% regressions in performance on some benchmarks IFF the majority of the benchmarks improve. On 15 July 2013 06:55, Nadav Rotem wrote: > I suspected that the problem is the movupd's that load xmm0 and xmm1. > I've seen this before on ARM, and I agree, it looks like the load is constrained by some other condition or pipeline stall before that. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at google.com Mon Jul 15 08:32:58 2013 From: eliben at google.com (Eli Bendersky) Date: Mon, 15 Jul 2013 08:32:58 -0700 Subject: [LLVMdev] Special cased global-to-local-in-main replacement in GlobalOpt In-Reply-To: References: Message-ID: Ping? On Mon, Jul 8, 2013 at 4:50 PM, Eli Bendersky wrote: > Hello, > > GlobalOpt has an interesting special-case optimization for globals that > are only accessed within "main". These globals are replaced by allocas > within the "main" function (and the GV itself is deleted). The full > condition for this happening is: > > // If this is a first class global and has only one accessing function > // and this function is main (which we know is not recursive we can make > // this global a local variable) we replace the global with a local > alloca > // in this function. > // > // NOTE: It doesn't make sense to promote non single-value types since we > // are just replacing static memory to stack memory. > // > // If the global is in different address space, don't bring it to stack. > if (!GS.HasMultipleAccessingFunctions && > GS.AccessingFunction && !GS.HasNonInstructionUser && > GV->getType()->getElementType()->isSingleValueType() && > GS.AccessingFunction->getName() == "main" && > GS.AccessingFunction->hasExternalLinkage() && > GV->getType()->getAddressSpace() == 0) { > > From today's discussion on IRC, there appear to be two problems with this > approach: > > 1) The hard-coding of "main" to mean "entry point to the code" that only > dynamically runs once. > 2) Assuming that "main" cannot be recursive (in the general sense). > > (1) is a problem for non-traditional compilation flows such as simply JIT > of freestanding code where "main" is not the entry point; another case is > PNaCl, where "main" is not the entry point ("_start" is), and others where > parts of the runtime environment are included in the IR together with the > user code. This is not the only place where the name "main" is hard-coded > within the LLVM code base, but it's a good example. > > (2) is a problem because the C standard, unlike the C++ standard, says > nothing about "main" not being recursive. C++11 says in 3.6.1: "The > function main shall not be used within a program.". C does not appear to > mention such a restriction, which may make the optimization invalid for C. > > A number of possible solutions were raised: some sort of function > attribute that marks an entry point, module-level entry point, documenting > that LLVM assumes that the entry point is always renamed to "main", etc. > These mostly address (1) but not (2). > > Any thoughts and suggestions are welcome. > > Eli > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Mon Jul 15 10:16:24 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Mon, 15 Jul 2013 10:16:24 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> Message-ID: <51E42E68.7010402@gmail.com> On 7/14/13 5:56 PM, Andrew Trick wrote: > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang > wrote: > >> 3.2 Compile partitions independently >> -------------------------------------- >> >> There are two camps: one camp advocate compiling partitions via >> multi-process, >> the other one favor multi-thread. >> >> Inside Apple compiler teams, I'm the only one belong to the 1st >> comp. I think >> while multi-proc sounds bit red-neck, it has its advantage for this >> purpose, and >> while multi-thread is certainly more eye-popping, it has its advantage >> as well. >> >> The advantage of multi-proc are: >> 1) easier to implement, the process run in its own address space. >> We don't need to worry about they can interfere with each other. >> >> 2)huge, or not unlimited, address space. >> >> The disadvantage is that it's expensive. But I guess the cost is >> almost negligible compared to the overall IPO compilation. >> >> The advantage of multi-threads I can imagine are: >> 1) sound fancy >> 2) it is light-weight >> 3) inter-thread communication is easier than IPC. >> >> Its disadvantage are: >> 1). Oftentime we will come across race-condition, and it took >> awful long time to figure it out. While the code is supposed >> to be mult-thread safe, we might miss some tricky case. >> Trouble-shooting race condition is a nightmare. >> >> 2) Small address space. This is big problem if we the compiler >> is built 32-bit . In that case, the compiler is not able to bring >> lots of stuff in memory even if the HW dose >> provide ample mem. >> >> 3) The thread-safe run-time lib is more expensive. >> I once linked a compiler using -lpthread (I dose not have to) on a >> UNIX platform, and saw the compiler slow down by about 1/3. >> >> I'm not able to convince the folks in other camp, neither are they >> able to convince me. I decide to implement both. Fortunately, this >> part is not difficult, it seems to be rather easy to crank out one >> within short >> period of time. It would be interesting to compare them side-by-side, >> and see which camp lose:-). On the other hand, if we run into >> race-condition >> problem, we choose multi-proc version as a fall-back. > > While I am a self-proclaimed multi-process red-neck, in this case I > would prefer to see a multi-threaded implementation because I want to > verify that LLVMContext can be used as advertised. I'm sure some extra > care will be needed to report failures/diagnostics, but we should > start with the assumption that this approach is not significantly > harder than multi-process because that's how we advertise the design. > > If any of the multi-threaded disadvantages you point out are real, I > would like to find out about it. > > 1. Race Conditions: We should be able to verify that the > thread-parallel vs. sequential or multi-process compilation generate > the same result. If they diverge, we would like to know about the bug > so it can be fixed--independent of LTO. > > 2. Small Address Space with LTO. We don't need to design around this > hypothetical case. > > 3. Expensive thread-safe runtime lib. We should not speculate that > platforms that we, as the LLVM community, care about have this > problem. Let's assume that our platforms are well implemented unless > we have data to the contrary. (Personally, I would even love to use > TLS in the compiler to vastly simplify API design in the backend, but > I am not going to be popular for saying so). > > We should be able to decompose each step of compilation for debugging. Yes, of course, we should be able to save the IR before and after each major steps, and when trouble-shooting, we should be able to focus on one smaller partition. Once the problem of of the partition is fixed, we can manually link all the partition and other libs into final executable/dyn-lib. This is one important reasons for partitioning. > So the multi-process "implementation" should just be a degenerate form > of threading with a bit of driver magic if you want to automate it. > > Yes! That is why I'd like implement both. There is one difference though, with multi-proc implementation, we need to pass the /path/to/{llc/opt} to the libLTO.{so|dylib}, such that it can invoke these tools from right place. While in multi-thread implementation, we don't need this info. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Mon Jul 15 10:24:32 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Mon, 15 Jul 2013 10:24:32 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> Message-ID: <51E43050.1040801@gmail.com> On 7/14/13 5:57 PM, Andrew Trick wrote: > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang > wrote: > >> 6) Miscellaneous >> =========== >> Will partitioning degrade performance in theory. I think it >> depends on the definition of >> performance. If performance means execution-time, I guess it dose not. >> However, if performance includes code-size, I think it may have some >> negative impact. >> Following is few scenario: >> >> - constants generated by the post-IPO passes are not shared across >> partitions >> - dead func may be detected during the post-IPO stage, and they may >> not be deleted. > > In don't know if it's feasible, but stable linker output, independent > of the partioning, is highly desirable. In theory, it is not possible. But I guess in practice, it is almost stable. > One of the most irritating performance regressions to track down > involves different versions of the host linker. If partitioning > decisions are thrown into the mix, this could be annoying. Is it > possible for the final link to do a better job cleaning up? > > If partition's tweaking parameter remain unchanged, the partition should remain unchanged as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From A31Chris at juno.com Mon Jul 15 10:14:43 2013 From: A31Chris at juno.com (Chris ) Date: Mon, 15 Jul 2013 10:14:43 -0700 Subject: [LLVMdev] Need compiler retargetted for legacy console system Message-ID: We are a community of enthusiasts for the Atari Jaguar console system http://en.wikipedia.org/wiki/Atari_Jaguar This system has NEVER HAD a cross-compiler for its GPU other than briefly that John Carmack created to make Doom for the system. But he lost that long ago and couldn't release it if he had it still. Now any work to get the main backbone of power out of the systems GPU must be done in time consuming ASM and no one in the community is experienced with retargetting. We are a relatively small community and probably could never raise the 50k+ in a reasonable time that it would cost a pro team to do this for us. We could raise some money but my offer is more for those who are actually learning to do these things and NEED a target that no one has ever done before because I think every other CPU from the 90s and onward has a compiler for it except the Jaguar. If anyone is intrigued please get ahold of me A31Chris -AT- Juno.com and we can discuss the details like what the community can actually pay and hooking you up with dev SDKs and emulator and most likely getting you hooked up with actual hardware for perhaps free(on loan) Thank you for your time! :) -- Using Opera's revolutionary email client: http://www.opera.com/mail/ ____________________________________________________________ 1 Odd spice that FIGHTS diabetes Can this unusual "super spice" control your blood sugar and fight diabetes? http://thirdpartyoffers.juno.com/TGL3141/51e42e03ecf512e022030st03vuc From Paul_Robinson at playstation.sony.com Mon Jul 15 10:45:51 2013 From: Paul_Robinson at playstation.sony.com (Robinson, Paul) Date: Mon, 15 Jul 2013 17:45:51 +0000 Subject: [LLVMdev] C++ ABI conformance? Message-ID: PR16537 demonstrates a defect in Clang/LLVM conformance with the Itanium C++ ABI. I poked around a little in the Clang, LLVM, and test-suite tests but didn't see anything that obviously looked like an attempt at an ABI conformance test set. Is there a conformance test set that I missed? If not, does anybody think they are privately testing ABI conformance? Alternatively, does anybody know of a test suite for ABI conformance? I found references to codesourcery but apparently their new corporate owners have abandoned that particular product. Thanks, --paulr From cbergstrom at pathscale.com Mon Jul 15 10:50:46 2013 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Tue, 16 Jul 2013 00:50:46 +0700 Subject: [LLVMdev] [cfe-dev] C++ ABI conformance? In-Reply-To: References: Message-ID: <51E43676.5060209@pathscale.com> On 07/16/13 12:45 AM, Robinson, Paul wrote: > PR16537 demonstrates a defect in Clang/LLVM conformance with the > Itanium C++ ABI. I poked around a little in the Clang, LLVM, and > test-suite tests but didn't see anything that obviously looked > like an attempt at an ABI conformance test set. > > Is there a conformance test set that I missed? If not, does > anybody think they are privately testing ABI conformance? > Alternatively, does anybody know of a test suite for ABI > conformance? I found references to codesourcery but apparently > their new corporate owners have abandoned that particular product. The libcxxrt has some tests, but it's far from exhaustive https://github.com/pathscale/libcxxrt/tree/master/test We'd welcome more tests and or be willing to help develop them From hfinkel at anl.gov Mon Jul 15 11:00:45 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Mon, 15 Jul 2013 13:00:45 -0500 (CDT) Subject: [LLVMdev] C++ ABI conformance? In-Reply-To: Message-ID: <1842676372.12058425.1373911245960.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > PR16537 demonstrates a defect in Clang/LLVM conformance with the > Itanium C++ ABI. I poked around a little in the Clang, LLVM, and > test-suite tests but didn't see anything that obviously looked > like an attempt at an ABI conformance test set. > > Is there a conformance test set that I missed? If not, does > anybody think they are privately testing ABI conformance? > Alternatively, does anybody know of a test suite for ABI > conformance? I found references to codesourcery but apparently > their new corporate owners have abandoned that particular product. GCC has an ABI part of its test suite that is designed to test gcc against another compiler. Specifically the things in ./g++.dg/compat and ./gcc.dg/compat (and ./gcc.c-torture/compat). That is the best thing of which I know. -Hal > > Thanks, > --paulr > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From hpa at zytor.com Mon Jul 15 11:40:04 2013 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 15 Jul 2013 11:40:04 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> Message-ID: <51E44204.4000802@zytor.com> On 07/14/2013 12:49 PM, Linus Torvalds wrote: > On Sun, Jul 14, 2013 at 12:30 PM, Tim Northover wrote: >> >> I don't think you've actually tested that, have you? (x86-64) > > Oh, you're right, for constants > 5 bits you have that other thing > going on. I didn't think about the fact that the constant changed in > the middle of the thread (it started out as 1). > > We use the gcc constraint "I" (0-31) in the kernel for this reason. > > Linus This is also why the Intel manuals point out that "some assemblers" can take things like: bt[l] $63,(%rsi) ... and turn it into: btl $31,4(%rsi) This is definitely the friendly thing to do toward the human programmer. Unfortunately gas doesn't, nor does e.g. NASM. -hpa From torvalds at linux-foundation.org Mon Jul 15 11:56:21 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 15 Jul 2013 11:56:21 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E44204.4000802@zytor.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E44204.4000802@zytor.com> Message-ID: On Mon, Jul 15, 2013 at 11:40 AM, H. Peter Anvin wrote: > On 07/14/2013 12:49 PM, Linus Torvalds wrote: > > This is also why the Intel manuals point out that "some assemblers" can > take things like: > > bt[l] $63,(%rsi) > > ... and turn it into: > > btl $31,4(%rsi) > > This is definitely the friendly thing to do toward the human programmer. > Unfortunately gas doesn't, nor does e.g. NASM. Yeah, that's definitely a "quality of implementation" issue. Clearly "bt $63,mem" is talking about bit 63, and a quality assembler would either warn about it or just do the RightThing(tm) like the intel manual says. I'd actually like to say "think you" to the gas people, because gas today may not do the above, but gas today is still *lightyears* ahead of where it used to be two decades ago. Back in those dark ages, GNU as was even documented to be *only* about turning compiler output into object code, and gas was the ghastliest assembler on the planet - it silently did horrible horrible things, and didn't do *anything* user-friendly or clever. It would entirely ignore things like implied sizes from register names etc, and generate code that was obviously not at all what the user expected, but because cc1 always used explicit sizes etc and only generated very specific syntax, it "wasn't an issue". gas has improved immensely in this regard, and the fact that it silently takes a $63 and effectively turns it into $31 is something I think is not nice and not a good QoI, but considering where gas came from, I'm not going to complain about it too much. It's "understandable", even if it isn't great. But quite frankly, partly because of just how bad gas used to be wrt issues like this, I think that any other assembler should aim to be at _least_ as good as gas, if not better. Because I definitely don't want to go back to the bad old days. I've been there, done that. Assemblers that are worse than gas are not worth working with. gas should be considered the minimal implementation quality, and llvm-as should strive to do better rather than worse.. (NASM used to be *much* more pleasant to work with than gas. Maybe you should strive to make nasm DTRT wrt bt and constants, and maintain that lead?) Linus From torvalds at linux-foundation.org Mon Jul 15 12:00:40 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 15 Jul 2013 12:00:40 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E44204.4000802@zytor.com> Message-ID: On Mon, Jul 15, 2013 at 11:56 AM, Linus Torvalds wrote: > > I'd actually like to say "think you" to the gas people My fingers are all over the place today. "thank you". It's not even like "i" and "a" are next to each other - they're at different ends of the keyboard. Clearly I'm more used to writing "think" than "thank". Linus From hpa at zytor.com Mon Jul 15 12:00:18 2013 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 15 Jul 2013 12:00:18 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E44204.4000802@zytor.com> Message-ID: <51E446C2.6050508@zytor.com> On 07/15/2013 11:56 AM, Linus Torvalds wrote: > > (NASM used to be *much* more pleasant to work with than gas. Maybe you > should strive to make nasm DTRT wrt bt and constants, and maintain > that lead?) > Yes, that is in fact why I took over NASM development when Simon Tatham gave up on it. I'll look at adding this in the next version, it should be easy enough. -hpa From dblaikie at gmail.com Mon Jul 15 13:01:40 2013 From: dblaikie at gmail.com (David Blaikie) Date: Mon, 15 Jul 2013 13:01:40 -0700 Subject: [LLVMdev] Fwd: Regarding scope information for variable declaration. In-Reply-To: References: <1343657370.33274.YahooMailNeo@web165005.mail.bf1.yahoo.com> <1373598022502-59268.post@n5.nabble.com> <1373869961.27602.YahooMailNeo@web165006.mail.bf1.yahoo.com> Message-ID: On Mon, Jul 15, 2013 at 2:40 AM, Eric Lu wrote: > > Thank your reply. Pankaj. > > Actually, I have done it very similar to yours. But I think for my demand, > it is better to implement in Front End. Maybe I will re-implement it later > in clang. Depends what you're trying to do with this scope information... if it's purely for optimization purposes then it should just be in LLVM, with the exception of something like LLVM's lifetime intrinsics, used as a hint from the frontend. > ---------- Forwarded message ---------- > From: Pankaj Gode [via LLVM] > Date: Mon, Jul 15, 2013 at 2:35 PM > Subject: Re: Regarding scope information for variable declaration. > To: "eric.lew" > > > Hi Eric, > > I was considering machine instructions to get scope information. And > variable declaration does not correspond to machine instruction, hence the > problem i.e. no scope associated with it. > If 'i' is initialized in the 'if-scope' then we get 'variable i' mapped to > correct scope as corresponding machine instruction is generated for this. > This is not a problem as we can't expect variable declaration in a machine > instruction, I thought. > > S > o instead of using machine instructions to collect scope information, (as > used by LexicalScope pass), > I had written code to collect scope information based on LLVM Instructions. > I did this by iterating over 'Function->BasicBlock' instead of > 'MachineFunction->MachineBasicBlock'. > const Function *F1 = MF->getFunction(); > for(Function::const_iterator BB = F1->begin(), E = F1->end(); > BB != E; ++BB) > { > for(BasicBlock::const_iterator ii = BB->begin(), ie = BB->end(); > ii != ie; ++ii) > { > const Instruction *I = ii; //I->dump();//debug > DebugLoc MIDB = I->getDebugLoc(); > } > } > Though this is an overhead as scope information exists, > but I need to collect specific information such as 'start line, end line, > start column, end column' > (End line information should be derived as is not obvious). > Collecting information this way allowed me to get correct scope information, > and hence I was able to map the variable declaration to the scope. It worked > for me this way. > > > Regards, > Pankaj > > > From: eric.lew <[hidden email]> > To: [hidden email] > Sent: Friday, July 12, 2013 8:30 AM > Subject: Re: [LLVMdev] Regarding scope information for variable declaration. > > I have the same demand. Have you resolved this problems? if so, would you > share me the solution? > > Best Regards. > > Eric > > > > -- > View this message in context: > http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59268.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > [hidden email] http://llvm.cs.uiuc.edu/ > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > [hidden email] http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://llvm.1065342.n5.nabble.com/Regarding-scope-information-for-variable-declaration-tp47707p59345.html > To unsubscribe from Regarding scope information for variable declaration., > click here. > NAML > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From rjmccall at apple.com Mon Jul 15 15:12:51 2013 From: rjmccall at apple.com (John McCall) Date: Mon, 15 Jul 2013 15:12:51 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> Message-ID: <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> On Jul 11, 2013, at 6:13 PM, Nick Lewycky wrote: > On 11 July 2013 18:02, John McCall wrote: > On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: > > Hi! A few of us over at Google think a nice feature in clang would be ODR violation checking, and we thought for a while about how to do this and wrote it down, but we aren't actively working on it at the moment nor plan to in the near future. I'm posting this to share our design and hopefully save anyone else the design work if they're interested in it. > > > > For some background, C++'s ODR rule roughly means that two definitions of the same symbol must come from "the same tokens with the same interpretation". Given the same token stream, the interpretation can be different due to different name lookup results, or different types through typedefs or using declarations, or due to a different point of instantiation in two translation units. > > > > Unlike existing approaches (the ODR checker in the gold linker for example), clang lets us do this with no false positives and very few false negatives. The basis of the idea is that we produce a hash of all the ODR-relevant pieces, and to try to pick the largest possible granularity. By granularity I mean that we would hash the entire definition of a class including all methods defined lexically inline and emit a single value for that class. > > > > The first step is to build a new visitor over the clang AST that calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work here because it includes pointers addresses which will be different across different translation units.) Hash the outermost declaration with external-linkage. For example, given a class with a method defined inline, we start the visitor at the class, not at the method. The entirety of the class must be ODR-equivalent across two translation units, including any inline methods. > > > > Although the standard mentions that the tokens must be the same, we do not actually include the tokens in the hash. The structure of the AST includes everything about the code which is semantically relevant. Any false positives that would be fixed by hashing the tokens either do not impact the behaviour of the program or could be fixed by hashing more of the AST. References to globals should be hashed by name, but references to locals should be hashed by an ordinal number. > > > > Instantiated templates are also visited by the hashing visitor. If we did not, we would have false negatives where the code is not conforming due to different points of instantiation in two translation units. We can skip uninstantiated templates since they don’t affect the behaviour of the program, and we need to visit the instantiations regardless. > > > > In LLVM IR, create a new named metadata node !llvm.odr_checking which contains a list of pairs. The names do not necessarily correspond to symbols, for instance, a class will have a hash value but does not have a corresponding symbol. For ease of implementation, names should be mangled per the C++ Itanium ABI (demanglable with c++filt -t). Merging modules that contain these will need to do ODR checking as part of that link, and the resulting module will have the union of these tables. > > > > In the .o file, emit a sorted table of in a non-loadable section intended to be read by the linker. All entries in the table must be checked if any symbol from this .o file is involved in the link (note that there is no mapping from symbol to odr table name). If two .o files contain different hash values for the same name, we have detected an ODR violation and issue a diagnostic. > > > > Finally, teach the loader (RuntimeDyld) to do verification and catch ODR violations when dlopen'ing a shared library. > > This is the right basic design, but I'm curious why you're suggesting that the payload should just be a hash instead of an arbitrary string. > > What are you suggesting goes into this string? The same sorts of things that you were planning on hashing, but maybe not hashed. It's up to you; having a full string would let you actually show a useful error message, but it definitely inflates binary sizes. If you really think you can make this performant enough to do on every load, I can see how the latter would be important. > This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it. > > Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls. Your ABI dependencies are every declaration *that you ever rely on*. You've got to figure that that's going to be very large. For a library of any significance, I'd be expecting this check to touch about half a megabyte of data, even with a 32-bit hash and some sort of clever prefixing scheme on the symbols. That's a pretty major regression in library loading. > Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit. It would be informed by actual symbol visibility, of course. > > Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right? Class hashes should probably only include virtual methods anyway, but yes, I think this is a good starting point. What do you want in the hash for a function anyway? Almost everything is already captured by (1) the separate hashes for the nominal types mentioned and (2) the symbol mangling. You're pretty much only missing the return type. Oh, I guess you need the body's dependencies for inline functions. > You should expect that there may be multiple hashing schemes (or versions thereof) in play and therefore build a simple prefixing scheme on your ODR symbols. > > We could put the choice of hashing algorithm in the name of the llvm named metadata node, and in the name of the section in the .o files. A header on the section sounds good. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at metafoo.co.uk Mon Jul 15 15:20:56 2013 From: richard at metafoo.co.uk (Richard Smith) Date: Mon, 15 Jul 2013 15:20:56 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> Message-ID: On Mon, Jul 15, 2013 at 3:12 PM, John McCall wrote: > On Jul 11, 2013, at 6:13 PM, Nick Lewycky wrote: > > On 11 July 2013 18:02, John McCall wrote: > >> On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: >> > Hi! A few of us over at Google think a nice feature in clang would be >> ODR violation checking, and we thought for a while about how to do this and >> wrote it down, but we aren't actively working on it at the moment nor plan >> to in the near future. I'm posting this to share our design and hopefully >> save anyone else the design work if they're interested in it. >> > >> > For some background, C++'s ODR rule roughly means that two definitions >> of the same symbol must come from "the same tokens with the same >> interpretation". Given the same token stream, the interpretation can be >> different due to different name lookup results, or different types through >> typedefs or using declarations, or due to a different point of >> instantiation in two translation units. >> > >> > Unlike existing approaches (the ODR checker in the gold linker for >> example), clang lets us do this with no false positives and very few false >> negatives. The basis of the idea is that we produce a hash of all the >> ODR-relevant pieces, and to try to pick the largest possible granularity. >> By granularity I mean that we would hash the entire definition of a class >> including all methods defined lexically inline and emit a single value for >> that class. >> > >> > The first step is to build a new visitor over the clang AST that >> calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler >> doesn’t work here because it includes pointers addresses which will be >> different across different translation units.) Hash the outermost >> declaration with external-linkage. For example, given a class with a method >> defined inline, we start the visitor at the class, not at the method. The >> entirety of the class must be ODR-equivalent across two translation units, >> including any inline methods. >> > >> > Although the standard mentions that the tokens must be the same, we do >> not actually include the tokens in the hash. The structure of the AST >> includes everything about the code which is semantically relevant. Any >> false positives that would be fixed by hashing the tokens either do not >> impact the behaviour of the program or could be fixed by hashing more of >> the AST. References to globals should be hashed by name, but references to >> locals should be hashed by an ordinal number. >> > >> > Instantiated templates are also visited by the hashing visitor. If we >> did not, we would have false negatives where the code is not conforming due >> to different points of instantiation in two translation units. We can skip >> uninstantiated templates since they don’t affect the behaviour of the >> program, and we need to visit the instantiations regardless. >> > >> > In LLVM IR, create a new named metadata node !llvm.odr_checking which >> contains a list of pairs. The names do not >> necessarily correspond to symbols, for instance, a class will have a hash >> value but does not have a corresponding symbol. For ease of implementation, >> names should be mangled per the C++ Itanium ABI (demanglable with c++filt >> -t). Merging modules that contain these will need to do ODR checking as >> part of that link, and the resulting module will have the union of these >> tables. >> > >> > In the .o file, emit a sorted table of in a >> non-loadable section intended to be read by the linker. All entries in the >> table must be checked if any symbol from this .o file is involved in the >> link (note that there is no mapping from symbol to odr table name). If two >> .o files contain different hash values for the same name, we have detected >> an ODR violation and issue a diagnostic. >> > >> > Finally, teach the loader (RuntimeDyld) to do verification and catch >> ODR violations when dlopen'ing a shared library. >> >> This is the right basic design, but I'm curious why you're suggesting >> that the payload should just be a hash instead of an arbitrary string. > > > What are you suggesting goes into this string? > > > The same sorts of things that you were planning on hashing, but maybe not > hashed. It's up to you; having a full string would let you actually show a > useful error message, but it definitely inflates binary sizes. If you > really think you can make this performant enough to do on every load, I can > see how the latter would be important. > > This isn't going to be performant enough to do unconditionally at every >> load no matter how much you shrink it. >> > > Every load of a shared object? That's not a fast operation even without > odr checking, but the idea is to keep the total number of entries in the > odr table small. It's less than the number of symbols, closer to the number > of top-level decls. > > > Your ABI dependencies are every declaration *that you ever rely on*. > You've got to figure that that's going to be very large. For a library of > any significance, I'd be expecting this check to touch about half a > megabyte of data, even with a 32-bit hash and some sort of clever prefixing > scheme on the symbols. That's a pretty major regression in library loading. > > Also, you should have something analogous to symbol visibility as a way to >> tell the static linker that something only needs to be ODR-checked within a >> linkage unit. It would be informed by actual symbol visibility, of course. >> > > Great point, and that needs to flow into the .o files as well. If a class > has one visibility and its method has another, we want to skip the method > when hashing the class, and need to emit an additional entry for the method > alone? Is that right? > > > Class hashes should probably only include virtual methods anyway, but yes, > I think this is a good starting point. > > What do you want in the hash for a function anyway? Almost everything is > already captured by (1) the separate hashes for the nominal types mentioned > and (2) the symbol mangling. You're pretty much only missing the return > type. Oh, I guess you need the body's dependencies for inline functions. > We want to enforce the C++ ODR as much as is reasonably possible, so we want to include the body for both classes and functions. That is, we explicitly want to check for cases where two functions or classes happen to have the same declaration but different definitions. (Perhaps this also clarifies why we want a hash: an unhashed string would contain as much entropy as the entirety of the source code...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjcrane at uci.edu Mon Jul 15 15:39:57 2013 From: sjcrane at uci.edu (Stephen Crane) Date: Mon, 15 Jul 2013 15:39:57 -0700 Subject: [LLVMdev] Command Line Flags for LTOModule Message-ID: While looking at adding a TargetOption, I saw that there is significant overlap between the options listed in llvm/CodeGen/CommandFlags.h (which are used to set TargetOptions in llc and opt) and the options in LTOModule.cpp. There are only a few extra options in CommandFlags.h, and all target options used by LTO are there. Would it make sense to use CommandFlags.h in LTOModule as well? - Stephen Crane -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Mon Jul 15 15:42:39 2013 From: rjmccall at apple.com (John McCall) Date: Mon, 15 Jul 2013 15:42:39 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> Message-ID: <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> On Jul 15, 2013, at 3:20 PM, Richard Smith wrote: > On Mon, Jul 15, 2013 at 3:12 PM, John McCall wrote: > On Jul 11, 2013, at 6:13 PM, Nick Lewycky wrote: >> On 11 July 2013 18:02, John McCall wrote: >> On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: >> > Hi! A few of us over at Google think a nice feature in clang would be ODR violation checking, and we thought for a while about how to do this and wrote it down, but we aren't actively working on it at the moment nor plan to in the near future. I'm posting this to share our design and hopefully save anyone else the design work if they're interested in it. >> > >> > For some background, C++'s ODR rule roughly means that two definitions of the same symbol must come from "the same tokens with the same interpretation". Given the same token stream, the interpretation can be different due to different name lookup results, or different types through typedefs or using declarations, or due to a different point of instantiation in two translation units. >> > >> > Unlike existing approaches (the ODR checker in the gold linker for example), clang lets us do this with no false positives and very few false negatives. The basis of the idea is that we produce a hash of all the ODR-relevant pieces, and to try to pick the largest possible granularity. By granularity I mean that we would hash the entire definition of a class including all methods defined lexically inline and emit a single value for that class. >> > >> > The first step is to build a new visitor over the clang AST that calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work here because it includes pointers addresses which will be different across different translation units.) Hash the outermost declaration with external-linkage. For example, given a class with a method defined inline, we start the visitor at the class, not at the method. The entirety of the class must be ODR-equivalent across two translation units, including any inline methods. >> > >> > Although the standard mentions that the tokens must be the same, we do not actually include the tokens in the hash. The structure of the AST includes everything about the code which is semantically relevant. Any false positives that would be fixed by hashing the tokens either do not impact the behaviour of the program or could be fixed by hashing more of the AST. References to globals should be hashed by name, but references to locals should be hashed by an ordinal number. >> > >> > Instantiated templates are also visited by the hashing visitor. If we did not, we would have false negatives where the code is not conforming due to different points of instantiation in two translation units. We can skip uninstantiated templates since they don’t affect the behaviour of the program, and we need to visit the instantiations regardless. >> > >> > In LLVM IR, create a new named metadata node !llvm.odr_checking which contains a list of pairs. The names do not necessarily correspond to symbols, for instance, a class will have a hash value but does not have a corresponding symbol. For ease of implementation, names should be mangled per the C++ Itanium ABI (demanglable with c++filt -t). Merging modules that contain these will need to do ODR checking as part of that link, and the resulting module will have the union of these tables. >> > >> > In the .o file, emit a sorted table of in a non-loadable section intended to be read by the linker. All entries in the table must be checked if any symbol from this .o file is involved in the link (note that there is no mapping from symbol to odr table name). If two .o files contain different hash values for the same name, we have detected an ODR violation and issue a diagnostic. >> > >> > Finally, teach the loader (RuntimeDyld) to do verification and catch ODR violations when dlopen'ing a shared library. >> >> This is the right basic design, but I'm curious why you're suggesting that the payload should just be a hash instead of an arbitrary string. >> >> What are you suggesting goes into this string? > > The same sorts of things that you were planning on hashing, but maybe not hashed. It's up to you; having a full string would let you actually show a useful error message, but it definitely inflates binary sizes. If you really think you can make this performant enough to do on every load, I can see how the latter would be important. > >> This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it. >> >> Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls. > > Your ABI dependencies are every declaration *that you ever rely on*. You've got to figure that that's going to be very large. For a library of any significance, I'd be expecting this check to touch about half a megabyte of data, even with a 32-bit hash and some sort of clever prefixing scheme on the symbols. That's a pretty major regression in library loading. > >> Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit. It would be informed by actual symbol visibility, of course. >> >> Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right? > > Class hashes should probably only include virtual methods anyway, but yes, I think this is a good starting point. > > What do you want in the hash for a function anyway? Almost everything is already captured by (1) the separate hashes for the nominal types mentioned and (2) the symbol mangling. You're pretty much only missing the return type. Oh, I guess you need the body's dependencies for inline functions. > > We want to enforce the C++ ODR as much as is reasonably possible, so we want to include the body for both classes and functions. That is, we explicitly want to check for cases where two functions or classes happen to have the same declaration but different definitions. Mmm. So you want to warn the user that two libraries using different assertion settings both use the standard library? I think warning about actual differences in code, as opposed to differences in type/vtable layout, is going to be pretty fraught with uninteresting positives, but if you want to chase that rabbit, it's your time spent. Anyway, you only need to hash in function bodies for inline functions unless this is also an ELF abuse dectector. (*Whether* a function is inline seems like a legitimate thing to hash for the function signature.) John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfb at google.com Mon Jul 15 16:26:09 2013 From: jfb at google.com (JF Bastien) Date: Mon, 15 Jul 2013 16:26:09 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> Message-ID: > Mmm. So you want to warn the user that two libraries using different > assertion settings both use the standard library? > > I think warning about actual differences in code, as opposed to differences > in type/vtable layout, is going to be pretty fraught with uninteresting > positives, but if you want to chase that rabbit, it's your time spent. It's probably desirable to choose to detect ODR violations on classes or functions independently, although detecting calling convention differences in functions (without looking at the rest of the code) could also be useful. I've debugged enough debug+release mixing issues shared-libraries to feel that pain. From richard at metafoo.co.uk Mon Jul 15 16:47:46 2013 From: richard at metafoo.co.uk (Richard Smith) Date: Mon, 15 Jul 2013 16:47:46 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> Message-ID: On Mon, Jul 15, 2013 at 3:42 PM, John McCall wrote: > On Jul 15, 2013, at 3:20 PM, Richard Smith wrote: > > On Mon, Jul 15, 2013 at 3:12 PM, John McCall wrote: > >> On Jul 11, 2013, at 6:13 PM, Nick Lewycky wrote: >> >> On 11 July 2013 18:02, John McCall wrote: >> >>> On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: >>> > Hi! A few of us over at Google think a nice feature in clang would be >>> ODR violation checking, and we thought for a while about how to do this and >>> wrote it down, but we aren't actively working on it at the moment nor plan >>> to in the near future. I'm posting this to share our design and hopefully >>> save anyone else the design work if they're interested in it. >>> > >>> > For some background, C++'s ODR rule roughly means that two definitions >>> of the same symbol must come from "the same tokens with the same >>> interpretation". Given the same token stream, the interpretation can be >>> different due to different name lookup results, or different types through >>> typedefs or using declarations, or due to a different point of >>> instantiation in two translation units. >>> > >>> > Unlike existing approaches (the ODR checker in the gold linker for >>> example), clang lets us do this with no false positives and very few false >>> negatives. The basis of the idea is that we produce a hash of all the >>> ODR-relevant pieces, and to try to pick the largest possible granularity. >>> By granularity I mean that we would hash the entire definition of a class >>> including all methods defined lexically inline and emit a single value for >>> that class. >>> > >>> > The first step is to build a new visitor over the clang AST that >>> calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler >>> doesn’t work here because it includes pointers addresses which will be >>> different across different translation units.) Hash the outermost >>> declaration with external-linkage. For example, given a class with a method >>> defined inline, we start the visitor at the class, not at the method. The >>> entirety of the class must be ODR-equivalent across two translation units, >>> including any inline methods. >>> > >>> > Although the standard mentions that the tokens must be the same, we do >>> not actually include the tokens in the hash. The structure of the AST >>> includes everything about the code which is semantically relevant. Any >>> false positives that would be fixed by hashing the tokens either do not >>> impact the behaviour of the program or could be fixed by hashing more of >>> the AST. References to globals should be hashed by name, but references to >>> locals should be hashed by an ordinal number. >>> > >>> > Instantiated templates are also visited by the hashing visitor. If we >>> did not, we would have false negatives where the code is not conforming due >>> to different points of instantiation in two translation units. We can skip >>> uninstantiated templates since they don’t affect the behaviour of the >>> program, and we need to visit the instantiations regardless. >>> > >>> > In LLVM IR, create a new named metadata node !llvm.odr_checking which >>> contains a list of pairs. The names do not >>> necessarily correspond to symbols, for instance, a class will have a hash >>> value but does not have a corresponding symbol. For ease of implementation, >>> names should be mangled per the C++ Itanium ABI (demanglable with c++filt >>> -t). Merging modules that contain these will need to do ODR checking as >>> part of that link, and the resulting module will have the union of these >>> tables. >>> > >>> > In the .o file, emit a sorted table of in a >>> non-loadable section intended to be read by the linker. All entries in the >>> table must be checked if any symbol from this .o file is involved in the >>> link (note that there is no mapping from symbol to odr table name). If two >>> .o files contain different hash values for the same name, we have detected >>> an ODR violation and issue a diagnostic. >>> > >>> > Finally, teach the loader (RuntimeDyld) to do verification and catch >>> ODR violations when dlopen'ing a shared library. >>> >>> This is the right basic design, but I'm curious why you're suggesting >>> that the payload should just be a hash instead of an arbitrary string. >> >> >> What are you suggesting goes into this string? >> >> >> The same sorts of things that you were planning on hashing, but maybe not >> hashed. It's up to you; having a full string would let you actually show a >> useful error message, but it definitely inflates binary sizes. If you >> really think you can make this performant enough to do on every load, I can >> see how the latter would be important. >> >> This isn't going to be performant enough to do unconditionally at every >>> load no matter how much you shrink it. >>> >> >> Every load of a shared object? That's not a fast operation even without >> odr checking, but the idea is to keep the total number of entries in the >> odr table small. It's less than the number of symbols, closer to the number >> of top-level decls. >> >> >> Your ABI dependencies are every declaration *that you ever rely on*. >> You've got to figure that that's going to be very large. For a library of >> any significance, I'd be expecting this check to touch about half a >> megabyte of data, even with a 32-bit hash and some sort of clever prefixing >> scheme on the symbols. That's a pretty major regression in library loading. >> >> Also, you should have something analogous to symbol visibility as a way >>> to tell the static linker that something only needs to be ODR-checked >>> within a linkage unit. It would be informed by actual symbol visibility, >>> of course. >>> >> >> Great point, and that needs to flow into the .o files as well. If a class >> has one visibility and its method has another, we want to skip the method >> when hashing the class, and need to emit an additional entry for the method >> alone? Is that right? >> >> >> Class hashes should probably only include virtual methods anyway, but >> yes, I think this is a good starting point. >> >> What do you want in the hash for a function anyway? Almost everything is >> already captured by (1) the separate hashes for the nominal types mentioned >> and (2) the symbol mangling. You're pretty much only missing the return >> type. Oh, I guess you need the body's dependencies for inline functions. >> > > We want to enforce the C++ ODR as much as is reasonably possible, so we > want to include the body for both classes and functions. That is, we > explicitly want to check for cases where two functions or classes happen to > have the same declaration but different definitions. > > > Mmm. So you want to warn the user that two libraries using different > assertion settings both use the standard library? > libstdc++ does not use assert. IIRC, nor does libc++ unless you use its "no exceptions" mode. > I think warning about actual differences in code, as opposed to > differences in type/vtable layout, is going to be pretty fraught with > uninteresting positives, but if you want to chase that rabbit, it's your > time spent. > For code that already passes gold's --detect-odr-violations, the extra testing for definitions of inline functions would effectively be checking that we don't have two functions that are defined from the same token sequence but are interpreted in different ways, so the uninteresting positive rate should be rather low (or if not, then we've learned something important...). For non-inline functions and classes, the checking would be more novel, so the uninteresting positive rate is hard to be sure about. > Anyway, you only need to hash in function bodies for inline functions > unless this is also an ELF abuse dectector. (*Whether* a function is > inline seems like a legitimate thing to hash for the function signature.) > Giving different definitions (for either functions or classes) in different source files is one of the things we'd like to catch (although there are probably more direct ways to do so than a full ODR checker, such as maybe -Wmissing-prototypes). -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Mon Jul 15 17:12:54 2013 From: rjmccall at apple.com (John McCall) Date: Mon, 15 Jul 2013 17:12:54 -0700 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> Message-ID: On Jul 15, 2013, at 4:47 PM, Richard Smith wrote: > On Mon, Jul 15, 2013 at 3:42 PM, John McCall wrote: > On Jul 15, 2013, at 3:20 PM, Richard Smith wrote: >> On Mon, Jul 15, 2013 at 3:12 PM, John McCall wrote: >> On Jul 11, 2013, at 6:13 PM, Nick Lewycky wrote: >>> On 11 July 2013 18:02, John McCall wrote: >>> On Jul 11, 2013, at 5:45 PM, Nick Lewycky wrote: >>> > Hi! A few of us over at Google think a nice feature in clang would be ODR violation checking, and we thought for a while about how to do this and wrote it down, but we aren't actively working on it at the moment nor plan to in the near future. I'm posting this to share our design and hopefully save anyone else the design work if they're interested in it. >>> > >>> > For some background, C++'s ODR rule roughly means that two definitions of the same symbol must come from "the same tokens with the same interpretation". Given the same token stream, the interpretation can be different due to different name lookup results, or different types through typedefs or using declarations, or due to a different point of instantiation in two translation units. >>> > >>> > Unlike existing approaches (the ODR checker in the gold linker for example), clang lets us do this with no false positives and very few false negatives. The basis of the idea is that we produce a hash of all the ODR-relevant pieces, and to try to pick the largest possible granularity. By granularity I mean that we would hash the entire definition of a class including all methods defined lexically inline and emit a single value for that class. >>> > >>> > The first step is to build a new visitor over the clang AST that calculates a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work here because it includes pointers addresses which will be different across different translation units.) Hash the outermost declaration with external-linkage. For example, given a class with a method defined inline, we start the visitor at the class, not at the method. The entirety of the class must be ODR-equivalent across two translation units, including any inline methods. >>> > >>> > Although the standard mentions that the tokens must be the same, we do not actually include the tokens in the hash. The structure of the AST includes everything about the code which is semantically relevant. Any false positives that would be fixed by hashing the tokens either do not impact the behaviour of the program or could be fixed by hashing more of the AST. References to globals should be hashed by name, but references to locals should be hashed by an ordinal number. >>> > >>> > Instantiated templates are also visited by the hashing visitor. If we did not, we would have false negatives where the code is not conforming due to different points of instantiation in two translation units. We can skip uninstantiated templates since they don’t affect the behaviour of the program, and we need to visit the instantiations regardless. >>> > >>> > In LLVM IR, create a new named metadata node !llvm.odr_checking which contains a list of pairs. The names do not necessarily correspond to symbols, for instance, a class will have a hash value but does not have a corresponding symbol. For ease of implementation, names should be mangled per the C++ Itanium ABI (demanglable with c++filt -t). Merging modules that contain these will need to do ODR checking as part of that link, and the resulting module will have the union of these tables. >>> > >>> > In the .o file, emit a sorted table of in a non-loadable section intended to be read by the linker. All entries in the table must be checked if any symbol from this .o file is involved in the link (note that there is no mapping from symbol to odr table name). If two .o files contain different hash values for the same name, we have detected an ODR violation and issue a diagnostic. >>> > >>> > Finally, teach the loader (RuntimeDyld) to do verification and catch ODR violations when dlopen'ing a shared library. >>> >>> This is the right basic design, but I'm curious why you're suggesting that the payload should just be a hash instead of an arbitrary string. >>> >>> What are you suggesting goes into this string? >> >> The same sorts of things that you were planning on hashing, but maybe not hashed. It's up to you; having a full string would let you actually show a useful error message, but it definitely inflates binary sizes. If you really think you can make this performant enough to do on every load, I can see how the latter would be important. >> >>> This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it. >>> >>> Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls. >> >> Your ABI dependencies are every declaration *that you ever rely on*. You've got to figure that that's going to be very large. For a library of any significance, I'd be expecting this check to touch about half a megabyte of data, even with a 32-bit hash and some sort of clever prefixing scheme on the symbols. That's a pretty major regression in library loading. >> >>> Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit. It would be informed by actual symbol visibility, of course. >>> >>> Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right? >> >> Class hashes should probably only include virtual methods anyway, but yes, I think this is a good starting point. >> >> What do you want in the hash for a function anyway? Almost everything is already captured by (1) the separate hashes for the nominal types mentioned and (2) the symbol mangling. You're pretty much only missing the return type. Oh, I guess you need the body's dependencies for inline functions. >> >> We want to enforce the C++ ODR as much as is reasonably possible, so we want to include the body for both classes and functions. That is, we explicitly want to check for cases where two functions or classes happen to have the same declaration but different definitions. > > Mmm. So you want to warn the user that two libraries using different assertion settings both use the standard library? > > libstdc++ does not use assert. IIRC, nor does libc++ unless you use its "no exceptions" mode. libc++ does appear to use assertions if _LIBCPP_DEBUG is defined, but that causes an ABI break. Regardless, that is clearly not the point. A single library, compiled in release mode, vends an interface with an inline function containing an assert. This is an ODR violation. > I think warning about actual differences in code, as opposed to differences in type/vtable layout, is going to be pretty fraught with uninteresting positives, but if you want to chase that rabbit, it's your time spent. > > For code that already passes gold's --detect-odr-violations, the extra testing for definitions of inline functions would effectively be checking that we don't have two functions that are defined from the same token sequence but are interpreted in different ways, so the uninteresting positive rate should be rather low (or if not, then we've learned something important...). For non-inline functions and classes, the checking would be more novel, so the uninteresting positive rate is hard to be sure about. Hands: washed. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpa at zytor.com Mon Jul 15 11:40:52 2013 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 15 Jul 2013 11:40:52 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E2FACA.3050801@goop.org> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FACA.3050801@goop.org> Message-ID: <51E44234.5060300@zytor.com> On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote: > The SDM entry for BT mentions that the instruction may touch 2 or 4 > bytes depending on the operand size, but doesn't specifically mention > that a 64 bit operation size touches 8 bytes - and it doesn't mention > anything at all about operand size and access size in BTR/BTS/BTC > (unless it's implied as part of the discussion about encoding the MSBs > of a constant bit offset in the offset of the addressing mode). Is that > an oversight? Most likely. I'll check with the people responsible for the SDM here at Intel. -hpa From hpa at zytor.com Mon Jul 15 11:42:34 2013 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 15 Jul 2013 11:42:34 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <20130714211446.GP5643@tassilo.jf.intel.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FAB9.9050900@goop.org> <20130714211446.GP5643@tassilo.jf.intel.com> Message-ID: <51E4429A.8090500@zytor.com> On 07/14/2013 02:14 PM, Andi Kleen wrote: > > I think best would be to just find some way to implement LOCK prefix > patching using atomic compiler intrinsics and then switch to those > That will take a long time to make happen now, and then when we change things we have to wait for gcc to catch up. At some point this always degenerates into the "do we need a kernel-specific compiler" morass. -hpa From hpa at zytor.com Mon Jul 15 11:47:09 2013 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 15 Jul 2013 11:47:09 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E2FAB9.9050900@goop.org> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FAB9.9050900@goop.org> Message-ID: <51E443AD.3020907@zytor.com> On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote: > (resent without HTML) > > On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote: >> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) >> changed a bunch of btrl/btsl instructions to btr/bts, with the following >> justification: >> >> The inline assembly for the bit operations has been changed to remove >> explicit sizing hints on the instructions, so the assembler will pick >> the appropriate instruction forms depending on the architecture and >> the context. >> >> Unfortunately, GNU as does no such thing, and the AT&T syntax manual >> [1] contains no references to any such inference. As evidenced by the >> following experiment, gas always disambiguates btr/bts to btrl/btsl. >> Feed the following input to gas: >> >> btrl $1, 0 >> btr $1, 0 >> btsl $1, 0 >> bts $1, 0 > > When I originally did those patches, I was careful make sure that we > didn't give implied sizes to operations with only immediate and/or > memory operands because - in general - gas can't infer the operation > size from such operands. However, in the case of the bit test/set > operations, the memory access size is not really derived from the > operation size (the SDM is a bit vague), and even if it were it would be > an operation rather than semantic difference. So there's no real > problem with gas choosing 'l' as a default size in the absence of any > explicit override or constraint. > >> Check that btr matches btrl, and bts matches btsl in both cases: >> >> $ as --32 -a in.s >> $ as --64 -a in.s >> >> To avoid giving readers the illusion of such an inference, and for >> clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to >> disambiguate btr/bts automatically. > > That sounds reasonable for all other operations because it makes a real > semantic difference, but overly strict for bit operations. > To be fair, we *ought to* be able to do something like: asm volatile(LOCK_PREFIX "bts%z0 %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); ... but some older version of gcc are broken and emit "ll" rather than "q". Furthermore, since that would actually result in *worse* code emitted overall (unnecessary REX prefixes), I'm not exactly happy on the idea. -hpa From torvalds at linux-foundation.org Mon Jul 15 11:58:56 2013 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 15 Jul 2013 11:58:56 -0700 Subject: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix In-Reply-To: <51E443AD.3020907@zytor.com> References: <1373806562-30422-1-git-send-email-artagnon@gmail.com> <51E2FAB9.9050900@goop.org> <51E443AD.3020907@zytor.com> Message-ID: On Mon, Jul 15, 2013 at 11:47 AM, H. Peter Anvin wrote: > > To be fair, we *ought to* be able to do something like: > > asm volatile(LOCK_PREFIX "bts%z0 %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > > ... but some older version of gcc are broken and emit "ll" rather than > "q". Furthermore, since that would actually result in *worse* code > emitted overall (unnecessary REX prefixes), I'm not exactly happy on the > idea. I really think the "worse code" argument is the one that matters. Specifying the size of the operation is *overspecifying* things, exactly because the 32-bit encoding is actually the *better* one when possible. So it's much better to underspecify and let the assembler pick the best encoding, than it is to use an explicit size and get worse code. Which is why I brought up the issue of small constants and short jumps. I really believe this is exactly the same issue. Linus From maple.hl at gmail.com Mon Jul 15 19:00:00 2013 From: maple.hl at gmail.com (=?UTF-8?B?5bCP5Yia?=) Date: Tue, 16 Jul 2013 10:00:00 +0800 Subject: [LLVMdev] make lldb work Message-ID: Sorry if asked before. I'm new to LLDB, try to use it according to the lldb project site. I write some very simple code like: #include int main(int argc, char **argv) { int counter = 0; while ( counter < 10 ) counter++; printf("counter: %d\n", counter); return 0; } and the session like: $ clang -g main.c $ lldb-3.4 a.out (lldb) breakpoint set -l 8 ...... (lldb) breakpoint set -l 12 ...... (lldb) breakpoint list ...... (lldb) process launch Process 1105 launched: '/home/maple/debug/arena/a.out' (i386) counter: 10 Process 1105 exited with status = 0 (0x00000000) I checked with gdb, it works well. I'm not sure whether it's a bug or my false command. I'm using Ubuntu 12.04, and the lldb is from llvm.org/apt. It's svn186357. —— 美丽有两种: 一是深刻又动人的方程, 一是你泛着倦意淡淡的笑容。 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chengwang at multicorewareinc.com Mon Jul 15 23:45:24 2013 From: chengwang at multicorewareinc.com (Cheng Wang) Date: Tue, 16 Jul 2013 14:45:24 +0800 Subject: [LLVMdev] llvm field Message-ID: Is there any API in LLVM that can represent the field in java. such as the instucion "iget vx, vy, field_id" -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Tue Jul 16 00:29:02 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Tue, 16 Jul 2013 07:29:02 +0000 Subject: [LLVMdev] llvm field In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 6:45 AM, Cheng Wang wrote: > Is there any API in LLVM that can represent the field in java. such as the > instucion "iget vx, vy, field_id" There's no single instruction or API to do all of that. The closest equivalent would probably be having a pointer to some object and using a "getelementptr" instruction followed by a "load". For example, in: ; class MyType { int a; int b; byte c; } %MyType = {i32, i32, i8} ; byte foo(MyType obj) { return obj.c; } ; i.e. iget-byte vD, vObj, "c"? define i8 @foo(%MyType* %obj) { %c_addr = getelementptr %MyType* %obj, i32 0, i32 2 %c = load i8* %c_addr ret i8 %c } The "2" on getelementptr means calculate the address of field #2 (i.e. the i8, "c") in the object. Cheers. Tim. From elena.demikhovsky at intel.com Tue Jul 16 02:16:33 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Tue, 16 Jul 2013 09:16:33 +0000 Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions In-Reply-To: References: Message-ID: 32-bit code will work. ICC does not generate 32-bit KNC code. OS is 64-bit. - Elena -----Original Message----- From: Dmitry Mikushin [mailto:dmitry at kernelgen.org] Sent: Monday, July 15, 2013 16:04 To: Demikhovsky, Elena Cc: LLVM Developers Mailing List; Rackover, Zvi Subject: Re: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions Hello Elena, > There is no 32-bit KNC. Are you sure about this? From "System V Application Binary Interface K1OM Architecture Processor Supplement Version 1.0", p. 124: | A.1 Execution of 32-bit Programs | | The K1OM processors are able to execute 64-bit K1OM and also 32-bit ia32 programs. I'm really really looking for this opportunity, because we want to extend our kernel code generation capabilities [1] with MIC support. > No, 256-bit vectors are not supported. KNC is scalar ISA (Knights > Corner supports a subset of the Intel 64 Architecture instructions) + > 512-bit vectors + masks Of course, 512-bit, that was my typo, sorry. > Please check what ICC does. X87 registers are supported. Checked. Unfortunately ICC does use zmm in scalar 64-bit programs, which requires new ABI in LLVM. - D. [1] http://www.old.inf.usi.ch/file/pub/75/tech_report2013.pdf --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From samsonov at google.com Tue Jul 16 02:23:30 2013 From: samsonov at google.com (Alexey Samsonov) Date: Tue, 16 Jul 2013 13:23:30 +0400 Subject: [LLVMdev] Special case list files; a bug and a slowness issue In-Reply-To: <20130713003835.GA15193@pcc.me.uk> References: <20130713003835.GA15193@pcc.me.uk> Message-ID: Hi Peter! On Sat, Jul 13, 2013 at 4:38 AM, Peter Collingbourne wrote: > Hi, > > I need to be able to use a special case list file containing thousands > of entries (namely, a list of libc symbols, to be used when using > DFSan with an uninstrumented libc). Initially I built the symbol > list like this: > > fun:sym1=uninstrumented > fun:sym2=uninstrumented > fun:sym3=uninstrumented > ... > fun:sym6000=uninstrumented > > What I found was that, despite various bits of documentation [1,2], > the symbol names are matched as substrings, the root cause being that > the regular expressions built by the SpecialCaseList class do not > contain anchors. The attached unit test demonstrates the problem. > If I modify my symbol list to contain anchors: > > fun:^sym1$=uninstrumented > fun:^sym2$=uninstrumented > fun:^sym3$=uninstrumented > ... > fun:^sym6000$=uninstrumented > > the behaviour is as expected, but compiler run time is slow (on the > order of seconds), presumably because our regex library doesn't cope > with anchors very efficiently. > > I intend to resolve the substring bug and the slow run time issue > by using a StringSet for symbol patterns which do not contain regex > metacharacters. There would still be a regex for any other patterns, > which would have anchors added automatically. > I think that it's fine to add anchors automatically to implement the behavior described in the docs (I've LGTMed that patch). Do you want to avoid adding anchors for dfsan SpecialCaseList? > > Thoughts? > > Thanks, > -- > Peter > > [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer > [2] https://code.google.com/p/thread-sanitizer/wiki/Flags > -- Alexey Samsonov, MSK -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiaofei.wan at intel.com Tue Jul 16 03:33:08 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Tue, 16 Jul 2013 10:33:08 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> Message-ID: <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Hi, community: For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! 1. Is this idea the proper solution for my requirement 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 3. Can this new feature of llc be accepted by community and merged into LLVM code tree Patches The patch is divided into four separated parts, the all-in-one patch could be found here: http://llvm-reviews.chandlerc.com/D1152 Design https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing Background 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. Overall design strategy and goal 1. Generate totally same binary as what single thread output 2. No impacts on single thread performance & conformance 3. Little impacts on LLVM code infrastructure Current status and test result 1. Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads Thanks Wan Xiaofei -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.7z Type: application/octet-stream Size: 24682 bytes Desc: Parallel.CG.7z URL: From chandlerc at google.com Tue Jul 16 03:46:56 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 16 Jul 2013 03:46:56 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: While I think the end goal you're describing is close to the correct one, I see the high-level strategy for getting there somewhat differently: 1) The code generators are only one collection of function passes that might be parallelized. Many others might also be parallelized profitably. The design for parallelism within LLVM's pass management infrastructure should be sufficiently generic to express all of these use cases. 2) The idea of having multiple pass managers necessitates (unless I misunderstand) duplicating a fair amount of state. For example, the caches in immutable analysis passes would no longer be shared, etc. I think that is really unfortunate, and would prefer instead to use parallelizing pass managers that are in fact responsible for the scheduling of passes. 3) It doesn't provide a strategy for parallelizing the leaves of a CGSCC pass manager which is where a significant portion of the potential parallelism is available within the middle end. 4) It doesn't deal with the (numerous) parts of LLVM that are not actually thread safe today. They may happen to work with the code generators you're happening to test, but there is no guarantee. Notable things to think about here are computing new types, the use-def lists of globals, commandline flags, and static state variables. While our intent has been to avoid problems with the last two that could preclude parallelism, it seems unlikely that we have succeeded without thorough testing to this point. Instead, I fear we have leaned heavily on the crutch of one-thread-per-LLVMContext. 5) It adds more complexity onto the poorly designed pass manager infrastructure. Personally, I think that cleanups to the design and architecture of the pass manager should be prioritized above adding new functionality like parallelism. However, so far no one has really had time to do this (including myself). While I would like to have time in the future to do this, as with everything else in OSS, it won't be real until the patches start flowing. On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei wrote: > Hi, community: > > For the sake of our business need, I want to enable "Function-based > parallel code generation" to boost up the compilation of single module, > please see the details of the design and provide your feedbacks on below > aspects, thanks! > 1. Is this idea the proper solution for my requirement > 2. This new feature will be enabled by llc -thd=N and has no impact on > original llc when -thd=1 > 3. Can this new feature of llc be accepted by community and merged into > LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one patch could > be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR and link > them into a big BC file; the big BC file is then compiled into binary code > on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened on target > device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there is only > one big BC file; function-based parallel LLVM backend code generation is a > good solution to improve compilation time which will fully utilize > multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output > 2. No impacts on single thread performance & conformance > 3. Little impacts on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by "objdump -d", > it could pass 10 hours stress test for all performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 > threads > > > Thanks > Wan Xiaofei > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Tue Jul 16 04:05:17 2013 From: rkotler at mips.com (reed kotler) Date: Tue, 16 Jul 2013 04:05:17 -0700 Subject: [LLVMdev] eclipse and gdb Message-ID: <51E528ED.5060307@mips.com> Is anyone using Eclipse and gdb to debug llvm/clang? If so, which version of Eclipse, gdb and linux flavor. I just use gdb currently. I'm going to try using my mac also. Is anyone using xcode/lldb to debug llvm/clang? Tia. Reed From artagnon at gmail.com Tue Jul 16 04:06:27 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Tue, 16 Jul 2013 16:36:27 +0530 Subject: [LLVMdev] [PATCH 2/2] X86: infer immediate forms of bit-test instructions In-Reply-To: <1373972787-30652-1-git-send-email-artagnon@gmail.com> References: <1373972787-30652-1-git-send-email-artagnon@gmail.com> Message-ID: <1373972787-30652-3-git-send-email-artagnon@gmail.com> The instruction mnemonics for the immediate forms of bit-test instructions including bt, btr and bts, btc do not work. llvm-mc barfs with: error: ambiguous instructions require an explicit suffix This is highly user-unfriendly, since we can easily infer what the user meant by inspecting $imm and translating the instruction appropriately. Do it exactly as the Intel manual describes. Note that we are the first major assembler to do this properly: neither GNU as nor NASM does the right thing currently. Cc: Stephen Checkoway Cc: Tim Northover Cc: Jim Grosbach Cc: Chris Lattner Signed-off-by: Ramkumar Ramachandra --- lib/Target/X86/AsmParser/X86AsmParser.cpp | 30 ++++++++++++++++++++++++++++++ test/MC/X86/x86-64.s | 7 +++++++ 2 files changed, 37 insertions(+) diff --git a/lib/Target/X86/AsmParser/X86AsmParser.cpp b/lib/Target/X86/AsmParser/X86AsmParser.cpp index 263eb5e..fba0f3c 100644 --- a/lib/Target/X86/AsmParser/X86AsmParser.cpp +++ b/lib/Target/X86/AsmParser/X86AsmParser.cpp @@ -2124,6 +2124,36 @@ ParseInstruction(ParseInstructionInfo &Info, StringRef Name, SMLoc NameLoc, } } + // Infer the immediate form of bit-test instructions without length suffix + // correctly. The register form works fine. + // bt{,r,s,..} $n, mem becomes btl $(n % 32), (mem + 4 * (n / 32)) + if (Name.startswith("bt") + && !(Name.endswith("b") || Name.endswith("w") || Name.endswith("l") || Name.endswith("q")) + && Operands.size() == 3) { + X86Operand &Op1 = *(X86Operand*)Operands.begin()[1]; + X86Operand &Op2 = *(X86Operand*)Operands.begin()[2]; + + if (Op1.isImm() && isa(Op1.getImm()) && + Op2.isMem() && isa(Op2.Mem.Disp)) { + int64_t Given_imm = cast(Op1.getImm())->getValue(); + int64_t Given_mem = cast(Op2.Mem.Disp)->getValue(); + + static_cast(Operands[0])->setTokenValue(StringRef(Name.str() + "l")); + + SMLoc Loc = Op1.getEndLoc(); + const MCExpr *Op1_transformed = MCConstantExpr::Create(Given_imm % 32, getContext()); + Operands.begin()[1] = X86Operand::CreateImm(Op1_transformed, Loc, Loc); + + Loc = Op2.getEndLoc(); + const MCExpr *Op2_transformed = MCConstantExpr::Create(Given_mem + 4 * (Given_imm / 32), + getContext()); + Operands.begin()[2] = X86Operand::CreateMem(Op2_transformed, Loc, Loc); + + delete &Op1; + delete &Op2; + } + } + return false; } diff --git a/test/MC/X86/x86-64.s b/test/MC/X86/x86-64.s index ff60969..e82db1e 100644 --- a/test/MC/X86/x86-64.s +++ b/test/MC/X86/x86-64.s @@ -694,6 +694,13 @@ movl 0, %eax // CHECK: movl 0, %eax # encoding: [0x8b,0x04,0x25,0x00,0x00,0x00 // CHECK: encoding: [0x75,A] jnz 0 +// Infer immediate form of bit-test instructions without suffix +bt $1, 0 // CHECK: btl $1, 0 # encoding: [0x0f,0xba,0x24,0x25,0x00,0x00,0x00,0x00,0x01] +btl $1, 0 // CHECK: btl $1, 0 # encoding: [0x0f,0xba,0x24,0x25,0x00,0x00,0x00,0x00,0x01] +bt $63, 0 // CHECK: btl $31, 4 # encoding: [0x0f,0xba,0x24,0x25,0x04,0x00,0x00,0x00,0x1f] +btr $63, 0 // CHECK: btrl $31, 4 # encoding: [0x0f,0xba,0x34,0x25,0x04,0x00,0x00,0x00,0x1f] +btq $63, 0 // CHECK: btq $63, 0 # encoding: [0x48,0x0f,0xba,0x24,0x25,0x00,0x00,0x00,0x00,0x3f] + // rdar://8017515 btq $0x01,%rdx // CHECK: btq $1, %rdx -- 1.8.3.2.736.g869de25 From xiaofei.wan at intel.com Tue Jul 16 04:37:29 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Tue, 16 Jul 2013 11:37:29 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: <851E09B5CA368045827A32DA76E440AF019719BD@SHSMSX104.ccr.corp.intel.com> Thanks for your comments, see my reply below, thanks. Thanks Wan Xiaofei From: Chandler Carruth [mailto:chandlerc at google.com] Sent: Tuesday, July 16, 2013 6:47 PM To: Wan, Xiaofei Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation While I think the end goal you're describing is close to the correct one, I see the high-level strategy for getting there somewhat differently: 1) The code generators are only one collection of function passes that might be parallelized. Many others might also be parallelized profitably. The design for parallelism within LLVM's pass management infrastructure should be sufficiently generic to express all of these use cases. [xiaofei], yes, only passes in function pass manager are parallelized, it is enough to meet our requirement since 95% of time in llc are in function passes. 2) The idea of having multiple pass managers necessitates (unless I misunderstand) duplicating a fair amount of state. For example, the caches in immutable analysis passes would no longer be shared, etc. I think that is really unfortunate, and would prefer instead to use parallelizing pass managers that are in fact responsible for the scheduling of passes. [ Xiaofei ] For immutable passes, they are not parallelized, actually, only passes in function pass manager are parallelized The reason why I start multiple pass manager is, make original code infrastructure stable, each thread has its own PM, then consume functions independently. 3) It doesn't provide a strategy for parallelizing the leaves of a CGSCC pass manager which is where a significant portion of the potential parallelism is available within the middle end. 4) It doesn't deal with the (numerous) parts of LLVM that are not actually thread safe today. They may happen to work with the code generators you're happening to test, but there is no guarantee. Notable things to think about here are computing new types, the use-def lists of globals, commandline flags, and static state variables. While our intent has been to avoid problems with the last two that could preclude parallelism, it seems unlikely that we have succeeded without thorough testing to this point. Instead, I fear we have leaned heavily on the crutch of one-thread-per-LLVMContext. [Xiaofei] we consider all the aspects you are listing, otherwise, it can’t pass any test cases, now we could pass all benchmarks and almost all unit test cases especial cases. 5) It adds more complexity onto the poorly designed pass manager infrastructure. Personally, I think that cleanups to the design and architecture of the pass manager should be prioritized above adding new functionality like parallelism. However, so far no one has really had time to do this (including myself). While I would like to have time in the future to do this, as with everything else in OSS, it won't be real until the patches start flowing. [xiaofei] this feature doesn’t rely on PM too much; it doesn’t need to change PM infrastructure On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei > wrote: Hi, community: For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! 1. Is this idea the proper solution for my requirement 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 3. Can this new feature of llc be accepted by community and merged into LLVM code tree Patches The patch is divided into four separated parts, the all-in-one patch could be found here: http://llvm-reviews.chandlerc.com/D1152 Design https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing Background 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. Overall design strategy and goal 1. Generate totally same binary as what single thread output 2. No impacts on single thread performance & conformance 3. Little impacts on LLVM code infrastructure Current status and test result 1. Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads Thanks Wan Xiaofei _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Tue Jul 16 04:51:58 2013 From: rkotler at mips.com (reed kotler) Date: Tue, 16 Jul 2013 04:51:58 -0700 Subject: [LLVMdev] llvm-ld ??? Message-ID: <51E533DE.5070006@mips.com> What was llvm-ld and why was it removed? I see this reference regarding the Eclipse plugin for llvm. http://marketplace.eclipse.org/content/llvm-toolchain-eclipse-cdt#.UeUzW47Slk8 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alon.mishne at intel.com Tue Jul 16 05:10:54 2013 From: alon.mishne at intel.com (Mishne, Alon) Date: Tue, 16 Jul 2013 12:10:54 +0000 Subject: [LLVMdev] eclipse and gdb Message-ID: <02A5AC145ACFA54DA9FADFD78F1AE22E02EC09C0@HASMSX104.ger.corp.intel.com> I'm using Eclipse with gdb to develop and debug llvm. You need to use the "Eclipse IDE for C/C++ Developers" version for that (also sometimes called "Eclipse CDT" - CDT is the C++ component), though you can also download any other Eclipse version and just install CDT on top of it. As for versions, it's best to use the most recent version you can find. Should work fine on all Linux distros, though some advanced features require a minimum gdb version. As for creating the Eclipse project, think the simplest approach is to configure as usual, then create a "makefile project" in Eclipse and tell it to use the existing makefiles created under OBJ_ROOT - don't let Eclipse manage your makefiles. - Alon --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tscheller at apple.com Tue Jul 16 05:21:11 2013 From: tscheller at apple.com (Tilmann Scheller) Date: Tue, 16 Jul 2013 14:21:11 +0200 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E528ED.5060307@mips.com> References: <51E528ED.5060307@mips.com> Message-ID: <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> Hi Reed, I’ve used Eclipse for a long time to do LLVM development on Linux (both for code navigation/editing and debugging), any recent Linux distribution and version of Eclipse should be fine (even older versions should be good enough as this has been working for many years). Xcode works fine as well, I started to use Xcode exclusively when I switched to OS X. The key to make this work is to use CMake to generate project files for Eclipse/Xcode, you can do this by specifying the appropriate generator on the command line e.g. -G Xcode or -G "Eclipse CDT4 - Unix Makefiles”. Then you can just open the generated project file. Mind you, the generated projects are kind of ugly e.g. the Xcode project has like more than 200 targets but apart from that they are working fine. In terms of key bindings both Eclipse and Xcode ship with Emacs key bindings and there are plugins which allow you to use vim key bindings as well. With Eclipse I’ve been using the Viable plugin for that and for Xcode there is Xvim. Hope this helps :) Regards, Tilmann On Jul 16, 2013, at 1:05 PM, reed kotler wrote: > Is anyone using Eclipse and gdb to debug llvm/clang? > If so, which version of Eclipse, gdb and linux flavor. > > I just use gdb currently. > > I'm going to try using my mac also. > Is anyone using xcode/lldb to debug llvm/clang? > > Tia. > > Reed > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From evan.cheng at apple.com Tue Jul 16 05:23:36 2013 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 16 Jul 2013 05:23:36 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E087E2.5040101@gmail.com> References: <51E087E2.5040101@gmail.com> Message-ID: <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> Thanks for the proposal. This is important work which is one step towards making LTO more applicable for large applications. Some comments inline. On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > > > 3.1.1 Figure out Partition scheme > ---------------------------------- > we randomly pick up some function and put them in a partition. > It would be nice to perform some optimization at this moment. One opt > in my mind is to reorder functions in order to reduce working-set and > improve locality. > > Unfortunately, this opt seems to be bit blind at this time, because > - CallGraph is not annotated with estimated or profiled frequency. > - some linkers don't respect the order. It seems they just > remembers the function order of the pristine input obj/fake-obj, > and enforce this order at final link (link-exec/shared-lib) stage. > > Anyway, I try to ignore all these problems, and try to perform partition > via following steps. Maybe we have some luck on some platforms: > > o. DFS the call-graph, ignoring the self-resursive edges, if freq is > available, prioritizing the edges (i.e. corresponding to call-sites) > such that frequent edges are visited first. > > o. Cut the DFS spanning tree obtained from the previous step bottom-up, > Each cut/partition contains reasonable # of functions, and the aggregate > size of the functions of the partition should not exceeds predefined > threshold. I'd like to see more details about this step. How do you determine the "reasonable # of functions"? How do you define the threshold? It has to be the same for a given target / platform regardless of the configuration of the build machine, right? > > o. repeat the previous step until the Call-graph's DFS spanning tree > is empty. > > 3.1.2 Partition transformation > ------------------------------ > > This is bit involved. There are bunch of problems we have to tackle. > 1) When the use/def of a symbol are separated in different modules, > its attribute, like linkage, visibility, need to be changed > as well. > > [Example 1], if a symbol is flagged as "internal" to the module where > the it is defined, the linkage need to be changed into "internal" > to the executable/lib being compiled. > > [Example 2], For compile-time constants, their initialized value > needs to to cloned to the partitions where it is referenced, > The rationale is to make the post-ipo passes to take advantage > of the initlized value to squeeeze some performance. > > In order to not bloat the code size, the cloned constant should > mark "don't emit". [end of eg2] > > Being able to precisely update symbols' attribute is not only > vital to correctness, it has significant impact to the the > performance as well. > > I have not yet taken a thorough investigation of this issue. My > rudimentary implementation is simply flag symbol "external" when its > use/def are separated in different module. I believe this is one > of the most difficult part of this work. I guess it is going to > take long time to become stable. > > 2) In order to compile each partition in each separate thread (see > Section 3.2), we have to put partitions in distinct LLVMContext. > > I could be wrong, but I don't find the code which is able to > perform function cloning across LLVMContext. > > My workaround in the patch is to perform function cloning in > one LLVMContext (but in different Module, of course), then > save the module to disk file, and load it to memory using a > new LLVMContext. > > It is bit circuitous and expensive. Do you plan to fix this? What are the issues that prevented function cloning across multiple LLVMContexts? Evan > > One random observation: > Currently, function-scoped static variables are considered > as "global variables". When cloning a function with static variable, > compiler has no idea if the static variables are used only by > the function being cloned, and hence separate the function > and the variables. > > I guess it would be nice if we organized symbols by its scope > instead of its live-time. it would be convenient for this situation. > > 3.2 Compile partitions independently > -------------------------------------- > > There are two camps: one camp advocate compiling partitions via multi-process, > the other one favor multi-thread. > > Inside Apple compiler teams, I'm the only one belong to the 1st comp. I think > while multi-proc sounds bit red-neck, it has its advantage for this purpose, and > while multi-thread is certainly more eye-popping, it has its advantage > as well. > > The advantage of multi-proc are: > 1) easier to implement, the process run in its own address space. > We don't need to worry about they can interfere with each other. > > 2)huge, or not unlimited, address space. > > The disadvantage is that it's expensive. But I guess the cost is > almost negligible compared to the overall IPO compilation. > > The advantage of multi-threads I can imagine are: > 1) sound fancy > 2) it is light-weight > 3) inter-thread communication is easier than IPC. > > Its disadvantage are: > 1). Oftentime we will come across race-condition, and it took > awful long time to figure it out. While the code is supposed > to be mult-thread safe, we might miss some tricky case. > Trouble-shooting race condition is a nightmare. > > 2) Small address space. This is big problem if we the compiler > is built 32-bit . In that case, the compiler is not able to bring > lots of stuff in memory even if the HW dose > provide ample mem. > > 3) The thread-safe run-time lib is more expensive. > I once linked a compiler using -lpthread (I dose not have to) on a > UNIX platform, and saw the compiler slow down by about 1/3. > > I'm not able to convince the folks in other camp, neither are they > able to convince me. I decide to implement both. Fortunately, this > part is not difficult, it seems to be rather easy to crank out one within short > period of time. It would be interesting to compare them side-by-side, > and see which camp lose:-). On the other hand, if we run into race-condition > problem, we choose multi-proc version as a fall-back. > > Regardless which tech is going to use to compile partition > independently, in order to judiciously and adaptively choose appropriate > parallel-factor, the compiler certainly need a lib which is able to > figure out the load the entire system is in. I don't know if there are > such magic lib or not. > > 4. the tale of two kinds of linker > ---------------------------------- > > As far as I can tell, llvm suports two kind linker for its IPO compilation, > and the supports are embodied by two set of APIs/interfaces. > > o. Interface 1, those stuff named lto_xxxx(). > o. GNU gold interface, > The compiler interact with GNU gold via the adapter implemented > in tools/gold/gold-plugin.cpp. > > This adpater calls the interface-1 to control the IPO process. > It dose not have to call the interface APIs, I think it is definitely > ok it call internal functions. > > The compiler used to generate a single object file from the merged > IR, now it will generate multiple of them, one for each partition. > > So, the interface 1 is *NOT* sufficient any more. > > For gold linker users, it is easy to make them happy just by > hacking the adapter, informing the linker new input object > files. This is done transparently, the users don't need to install new ld. > > For those system which invoke ld interacting with the libLTO.{so,dylib}, > it has to accept the new APIs I added to the interface-1 in order to > enable the new functionality. Or maybe we can invoke '/the/path/to/ld -r *.o -o merged.o' > and feed the merged.o the linker (this will keep the interface > interact)? Unfortunately, it dose not work at all, how can I know the path > the ld? the libLTO.{so,dydlib} is invoked as plugin; it cannot see the argv. > How about hack them by adding a nasty flag pointing to the right ld? > Well, it works. However, I don't believe many people like to do it this way, > that means I loose huge number of "QA" who are working hard for this compiler. > > What's wrong with the interface-1? The ld side is more active than > the compiler side, however, in concept the IPO is driven by the compiler side. > This mean this interface is changing over time. > > In contrast, the gold interface (as I rever-engineer from the adpator > code) is more symbol-centric, taking little IPO-thing into account. > That interface is simple and stable. > > 5. the rudimentary implementation > --------------------------------- > > I make it works for bzip2 at cpu2kint yesterday. bzip2 is "tiny" > program, I intentionally lower the partition size to get 3 partitions. > There is no comment in the code, and it definitely need rewrite. I > just check the correctness (with ref input), and I don't measure how much > it degrade the performance. (due to the problem I have not got chance > to tackle, see section 3.1.2, the symbol attribute stuff). > > The control flow basically is: > 1. add a module pass to the IPO pass-manager, and figure > out the partition scheme. > > 2) physically partition the merged partition. > the IR and the obj of partition are placed in a new dir. "llvmipo" by default > > -- > ls llvmipo/ > Makefile merged part1.bc part1.o part2.bc part2.o part3.bc part3.o > -- > > 3) For demo purpose, I drive the post-IPO stage via a makefile, which encapsulate > hack and other nasty stuff. > > NOTE that the post-ipo pass in my hack contains only CodeGen pass, we need to > reorganize the PassManagerBuilder::populateLTOPassManager(), which intermingle > IPO pass along with intra-proc scalar pass, we need to separate them and the intra-proc > scalar pass to post-IPO stage. > > > 1 .PHONY = all > 2 > 3 > 4 BC = part1.bc part2.bc part3.bc > 5 OBJ = ${BC:.bc=.o} > 6 > 7 all : merged > 8 %.o : %.bc > 9 $(HOME)/tmp/lto.llc -filetype=obj $+ -o $@ > 10 > 11 merged : $(OBJ) > 12 /usr/bin/ld $+ -r -o $@ > 13 > > 4. as the Makefile sugguest, the *.o of the partions are linked into a single obj "merged" > and feed back to link. > > > 6) Miscellaneous > =========== > Will partitioning degrade performance in theory. I think it depends on the definition of > performance. If performance means execution-time, I guess it dose not. > However, if performance includes code-size, I think it may have some negative impact. > Following is few scenario: > > - constants generated by the post-IPO passes are not shared across partitions > - dead func may be detected during the post-IPO stage, and they may not be deleted. > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From evan.cheng at apple.com Tue Jul 16 05:23:43 2013 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 16 Jul 2013 05:23:43 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> Message-ID: <35E5093D-0D3F-43C0-8239-E2935E30AB2F@apple.com> +1 On Jul 14, 2013, at 5:56 PM, Andrew Trick wrote: > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > >> 3.2 Compile partitions independently >> -------------------------------------- >> >> There are two camps: one camp advocate compiling partitions via multi-process, >> the other one favor multi-thread. >> >> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I think >> while multi-proc sounds bit red-neck, it has its advantage for this purpose, and >> while multi-thread is certainly more eye-popping, it has its advantage >> as well. >> >> The advantage of multi-proc are: >> 1) easier to implement, the process run in its own address space. >> We don't need to worry about they can interfere with each other. >> >> 2)huge, or not unlimited, address space. >> >> The disadvantage is that it's expensive. But I guess the cost is >> almost negligible compared to the overall IPO compilation. >> >> The advantage of multi-threads I can imagine are: >> 1) sound fancy >> 2) it is light-weight >> 3) inter-thread communication is easier than IPC. >> >> Its disadvantage are: >> 1). Oftentime we will come across race-condition, and it took >> awful long time to figure it out. While the code is supposed >> to be mult-thread safe, we might miss some tricky case. >> Trouble-shooting race condition is a nightmare. >> >> 2) Small address space. This is big problem if we the compiler >> is built 32-bit . In that case, the compiler is not able to bring >> lots of stuff in memory even if the HW dose >> provide ample mem. >> >> 3) The thread-safe run-time lib is more expensive. >> I once linked a compiler using -lpthread (I dose not have to) on a >> UNIX platform, and saw the compiler slow down by about 1/3. >> >> I'm not able to convince the folks in other camp, neither are they >> able to convince me. I decide to implement both. Fortunately, this >> part is not difficult, it seems to be rather easy to crank out one within short >> period of time. It would be interesting to compare them side-by-side, >> and see which camp lose:-). On the other hand, if we run into race-condition >> problem, we choose multi-proc version as a fall-back. > > While I am a self-proclaimed multi-process red-neck, in this case I would prefer to see a multi-threaded implementation because I want to verify that LLVMContext can be used as advertised. I'm sure some extra care will be needed to report failures/diagnostics, but we should start with the assumption that this approach is not significantly harder than multi-process because that's how we advertise the design. > > If any of the multi-threaded disadvantages you point out are real, I would like to find out about it. > > 1. Race Conditions: We should be able to verify that the thread-parallel vs. sequential or multi-process compilation generate the same result. If they diverge, we would like to know about the bug so it can be fixed--independent of LTO. > > 2. Small Address Space with LTO. We don't need to design around this hypothetical case. > > 3. Expensive thread-safe runtime lib. We should not speculate that platforms that we, as the LLVM community, care about have this problem. Let's assume that our platforms are well implemented unless we have data to the contrary. (Personally, I would even love to use TLS in the compiler to vastly simplify API design in the backend, but I am not going to be popular for saying so). > > We should be able to decompose each step of compilation for debugging. So the multi-process "implementation" should just be a degenerate form of threading with a bit of driver magic if you want to automate it. > > -Andy > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From evan.cheng at apple.com Tue Jul 16 05:28:12 2013 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 16 Jul 2013 05:28:12 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> Please see Shuxin's proposal on "parallelizing post-IPO stage". It seems the two projects are related. Evan On Jul 16, 2013, at 3:33 AM, "Wan, Xiaofei" wrote: > Hi, community: > > For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! > 1. Is this idea the proper solution for my requirement > 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 > 3. Can this new feature of llc be accepted by community and merged into LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one patch could be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output > 2. No impacts on single thread performance & conformance > 3. Little impacts on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads > > > Thanks > Wan Xiaofei > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From xiaofei.wan at intel.com Tue Jul 16 07:23:03 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Tue, 16 Jul 2013 14:23:03 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> Message-ID: <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> Yes, the purpose is similar, we started this job from last year; But it Shuxin's solution is module based (correct me if I am wrong), we tried this solution and failed for many reasons, it is described in my design document https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing we need discuss two solution and compare them, then adopt one solution The biggest difference of module based parallelism and function based parallelism are 1. how to partition module into different pieces which consume similar time, it is a difficult question 2. How to make sure the generated binary is same each time 3. if 2 can't be achieved, it is difficult to validate the correctness of parallelism Thanks Wan Xiaofei -----Original Message----- From: Evan Cheng [mailto:evan.cheng at apple.com] Sent: Tuesday, July 16, 2013 8:28 PM To: Wan, Xiaofei Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu); Shuxin Yang Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation Please see Shuxin's proposal on "parallelizing post-IPO stage". It seems the two projects are related. Evan On Jul 16, 2013, at 3:33 AM, "Wan, Xiaofei" wrote: > Hi, community: > > For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! > 1. Is this idea the proper solution for my requirement 2. This new > feature will be enabled by llc -thd=N and has no impact on original > llc when -thd=1 3. Can this new feature of llc be accepted by > community and merged into LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one patch could be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj > Y-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output 2. No > impacts on single thread performance & conformance 3. Little impacts > on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by "objdump > -d", it could pass 10 hours stress test for all performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for > 4 threads > > > Thanks > Wan Xiaofei > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From KYLE.DUNN at UCDENVER.EDU Tue Jul 16 07:27:12 2013 From: KYLE.DUNN at UCDENVER.EDU (Dunn, Kyle) Date: Tue, 16 Jul 2013 14:27:12 +0000 Subject: [LLVMdev] Instantiating Target-Specifc ASM Parser Message-ID: Hello, I am working on backend development and would like to utilize my target's MCAsmParser inside of an MCInst-level class implementation. I noticed that the AsmParser is registered with the target registry however I am having no luck grepping for a "template" of how to instantiate it and have yet to find specific documentation on how it is done. Any ideas or help is greatly appreciated! Cheers, Kyle Dunn -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.espindola at gmail.com Tue Jul 16 07:52:16 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Tue, 16 Jul 2013 10:52:16 -0400 Subject: [LLVMdev] llvm-ld ??? In-Reply-To: <51E533DE.5070006@mips.com> References: <51E533DE.5070006@mips.com> Message-ID: It was was a strange mix of llvm-link and a system linker. It was removed because llvm-link is sufficient for testing and as a developer tool, but llvm-ld was not even close to being a full system linker. The use cases covered by llvm-ld should be handled by * libLTO when using ld64 on OS X * The gold plugin when using gold or very new versions of bfd ld. * lld. On 16 July 2013 07:51, reed kotler wrote: > What was llvm-ld and why was it removed? > > I see this reference regarding the Eclipse plugin for llvm. > > http://marketplace.eclipse.org/content/llvm-toolchain-eclipse-cdt#.UeUzW47Slk8 > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From letz at grame.fr Tue Jul 16 08:16:41 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Tue, 16 Jul 2013 17:16:41 +0200 Subject: [LLVMdev] General strategy to optimize LLVM IR Message-ID: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> Hi, Our DSL emit sub-optimal LLVM IR that we optimize later on (LLVM IR ==> LLVM IR) before dynamically compiling it with the JIT. We would like to simply follow what clang/clang++ does when compiling with -O1/-O2/-O3 options. Our strategy up to now what to look at the opt.cpp code and take part of it in order to implement our optimization code. It appears to be rather difficult to follow evolution of the LLVM IR optimization strategies. With LLVM 3.3 our optimization code does not produce code as fast as the one produced with clang -03 anymore. Moreover the new vectorizations passes are still not working. It there a recommended way to add -O1/-O2/-O3 kind of optimizations on LLVM IR code? Any code to look at beside the opt.cpp tool? Thanks. Stéphane Letz From elena.demikhovsky at intel.com Tue Jul 16 09:19:26 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Tue, 16 Jul 2013 16:19:26 +0000 Subject: [LLVMdev] Operand constrain specification Message-ID: Hi, How can I specify in a .td file that source and destination should not use the same register? Thanks. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Tue Jul 16 09:32:24 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Tue, 16 Jul 2013 11:32:24 -0500 (CDT) Subject: [LLVMdev] Operand constrain specification In-Reply-To: Message-ID: <397121722.12373772.1373992344390.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > > Hi, > > How can I specify in a .td file that source and destination should > not use the same register? I think that you can use the EarlyClobber operand flag to achieve this (TableGen has an @earlyclobber constraint; there are some examples in the ARM backend). -Hal > > Thanks. > > > * Elena > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From cameron.mcinally at nyu.edu Tue Jul 16 09:34:57 2013 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Tue, 16 Jul 2013 12:34:57 -0400 Subject: [LLVMdev] Operand constrain specification In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 12:19 PM, Demikhovsky, Elena wrote: > Hi, > > How can I specify in a .td file that source and destination should not use > the same register? Hey Elena, We use @earlyclobber. E.g. let Constraints = "@earlyclobber $dst" { ... } -Cameron From sjcrane at uci.edu Tue Jul 16 09:45:44 2013 From: sjcrane at uci.edu (Stephen Crane) Date: Tue, 16 Jul 2013 09:45:44 -0700 Subject: [LLVMdev] Command Line Flags for LTOModule Message-ID: <447CB861-3D7F-4E50-8C4C-0D75311F9D19@uci.edu> While looking at adding a new TargetOption, I saw that there is significant overlap between the options listed in llvm/CodeGen/CommandFlags.h (which are used to set TargetOptions in llc and opt) and the options in LTOModule.cpp. There are only a few extra options in CommandFlags.h, and all target options used by LTO are there. Would it make sense to use CommandFlags.h in LTOModule as well? - Stephen Crane -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Tue Jul 16 10:14:07 2013 From: tobias at grosser.es (Tobias Grosser) Date: Tue, 16 Jul 2013 10:14:07 -0700 Subject: [LLVMdev] New Attribute Group broke bitcode compatibility In-Reply-To: <51DCB20D.7010606@grosser.es> References: <20130524122654.46ED42A6C029@llvm.org> <51DCB20D.7010606@grosser.es> Message-ID: <51E57F5F.20901@grosser.es> On 07/09/2013 05:59 PM, Tobias Grosser wrote: > On 05/24/2013 05:26 AM, Diego Novillo wrote: >> Author: dnovillo >> Date: Fri May 24 07:26:52 2013 >> New Revision: 182638 >> >> URL: http://llvm.org/viewvc/llvm-project?rev=182638&view=rev >> Log: >> Add a new function attribute 'cold' to functions. >> >> Other than recognizing the attribute, the patch does little else. >> It changes the branch probability analyzer so that edges into >> blocks postdominated by a cold function are given low weight. >> >> Added analysis and code generation tests. Added documentation for the >> new attribute. > > It seems this commit broke bitcode compatibility with LLVM 3.3, but > surprisingly not with LLVM 3.2. This suggests that 3.2 did not yet > depend on the order of the attribute enum whereas 3.3 somehow seems > to depend on it. This may be related to Bill's attribute refactoring. Hi Bill, I just looked a little more into the above problem and it seems the bitcode writer support for the new attribute code produces unstable bitcode. The problem is in BitcodeWriter.cpp, where the new WriteAttributeGroupTable() writes out the attributes using this piece of code: if (Attr.isEnumAttribute()) { Record.push_back(0); Record.push_back(Attr.getKindAsEnum()); } else if (Attr.isAlignAttribute()) { Record.push_back(1); Record.push_back(Attr.getKindAsEnum()); Record.push_back(Attr.getValueAsInt()); getKindAsEnum() returns the actual value of the enum, which is then stored in the bitcode. This direct connection makes the bitcode dependent of the order of elements in the enum, which causes changes like the above to break bitcode compatibility. Specifically, bitcode from LLVM 3.3 is currently incompatible to bitcode from LLVM trunk. Do you have any plans to fix this or should I give it a shot? Cheers, Tobias From shuxin.llvm at gmail.com Tue Jul 16 10:37:11 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 10:37:11 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> References: <51E087E2.5040101@gmail.com> <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> Message-ID: <51E584C7.9020502@gmail.com> On 7/16/13 5:23 AM, Evan Cheng wrote: > Thanks for the proposal. This is important work which is one step towards making LTO more applicable for large applications. Some comments inline. > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > >> >> 3.1.1 Figure out Partition scheme >> ---------------------------------- >> we randomly pick up some function and put them in a partition. >> It would be nice to perform some optimization at this moment. One opt >> in my mind is to reorder functions in order to reduce working-set and >> improve locality. >> >> Unfortunately, this opt seems to be bit blind at this time, because >> - CallGraph is not annotated with estimated or profiled frequency. >> - some linkers don't respect the order. It seems they just >> remembers the function order of the pristine input obj/fake-obj, >> and enforce this order at final link (link-exec/shared-lib) stage. >> >> Anyway, I try to ignore all these problems, and try to perform partition >> via following steps. Maybe we have some luck on some platforms: >> >> o. DFS the call-graph, ignoring the self-resursive edges, if freq is >> available, prioritizing the edges (i.e. corresponding to call-sites) >> such that frequent edges are visited first. >> >> o. Cut the DFS spanning tree obtained from the previous step bottom-up, >> Each cut/partition contains reasonable # of functions, and the aggregate >> size of the functions of the partition should not exceeds predefined >> threshold. > I'd like to see more details about this step. How do you determine the "reasonable # of functions"? How do you define the threshold? It has to be the same for a given target / platform regardless of the configuration of the build machine, right? Say, each module should not contains : - no more than 100 functions, and - the total size of the functions in a partition should not exceed the pre-defined threshold, These threshold can be override by command line. >> o. repeat the previous step until the Call-graph's DFS spanning tree >> is empty. >> >> 3.1.2 Partition transformation >> ------------------------------ >> >> This is bit involved. There are bunch of problems we have to tackle. >> 1) When the use/def of a symbol are separated in different modules, >> its attribute, like linkage, visibility, need to be changed >> as well. >> >> [Example 1], if a symbol is flagged as "internal" to the module where >> the it is defined, the linkage need to be changed into "internal" >> to the executable/lib being compiled. >> >> [Example 2], For compile-time constants, their initialized value >> needs to to cloned to the partitions where it is referenced, >> The rationale is to make the post-ipo passes to take advantage >> of the initlized value to squeeeze some performance. >> >> In order to not bloat the code size, the cloned constant should >> mark "don't emit". [end of eg2] >> >> Being able to precisely update symbols' attribute is not only >> vital to correctness, it has significant impact to the the >> performance as well. >> >> I have not yet taken a thorough investigation of this issue. My >> rudimentary implementation is simply flag symbol "external" when its >> use/def are separated in different module. I believe this is one >> of the most difficult part of this work. I guess it is going to >> take long time to become stable. >> >> 2) In order to compile each partition in each separate thread (see >> Section 3.2), we have to put partitions in distinct LLVMContext. >> >> I could be wrong, but I don't find the code which is able to >> perform function cloning across LLVMContext. >> >> My workaround in the patch is to perform function cloning in >> one LLVMContext (but in different Module, of course), then >> save the module to disk file, and load it to memory using a >> new LLVMContext. >> >> It is bit circuitous and expensive. > Do you plan to fix this? What are the issues that prevented function cloning across multiple LLVMContexts? > > Evan > > We may fix it, I don't know for sure if it is a big gain at this moment. If issues is that, as far as I can tell, current code base dose not have functions support copying IR across different LLVMContext. For example, when it copy an instruction from src to dest, it check the "src", take a look of of its Type, and derive LLVMContext from the Type, and use the same context for the dest. So, we need to change the existing code. From evan.cheng at apple.com Tue Jul 16 10:40:49 2013 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 16 Jul 2013 10:40:49 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E584C7.9020502@gmail.com> References: <51E087E2.5040101@gmail.com> <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> <51E584C7.9020502@gmail.com> Message-ID: <47D759DE-C3FB-4279-9BA4-0EE12C278701@apple.com> On Jul 16, 2013, at 10:37 AM, Shuxin Yang wrote: > > On 7/16/13 5:23 AM, Evan Cheng wrote: >> Thanks for the proposal. This is important work which is one step towards making LTO more applicable for large applications. Some comments inline. >> >> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: >> >>> >>> 3.1.1 Figure out Partition scheme >>> ---------------------------------- >>> we randomly pick up some function and put them in a partition. >>> It would be nice to perform some optimization at this moment. One opt >>> in my mind is to reorder functions in order to reduce working-set and >>> improve locality. >>> >>> Unfortunately, this opt seems to be bit blind at this time, because >>> - CallGraph is not annotated with estimated or profiled frequency. >>> - some linkers don't respect the order. It seems they just >>> remembers the function order of the pristine input obj/fake-obj, >>> and enforce this order at final link (link-exec/shared-lib) stage. >>> >>> Anyway, I try to ignore all these problems, and try to perform partition >>> via following steps. Maybe we have some luck on some platforms: >>> >>> o. DFS the call-graph, ignoring the self-resursive edges, if freq is >>> available, prioritizing the edges (i.e. corresponding to call-sites) >>> such that frequent edges are visited first. >>> >>> o. Cut the DFS spanning tree obtained from the previous step bottom-up, >>> Each cut/partition contains reasonable # of functions, and the aggregate >>> size of the functions of the partition should not exceeds predefined >>> threshold. >> I'd like to see more details about this step. How do you determine the "reasonable # of functions"? How do you define the threshold? It has to be the same for a given target / platform regardless of the configuration of the build machine, right? > Say, each module should not contains : > - no more than 100 functions, and > - the total size of the functions in a partition should not exceed the pre-defined threshold, > > These threshold can be override by command line. But how do you come about the thresholds? And are they fixed thresholds or determined based on analysis of function size, etc.? Evan > >>> o. repeat the previous step until the Call-graph's DFS spanning tree >>> is empty. >>> >>> 3.1.2 Partition transformation >>> ------------------------------ >>> >>> This is bit involved. There are bunch of problems we have to tackle. >>> 1) When the use/def of a symbol are separated in different modules, >>> its attribute, like linkage, visibility, need to be changed >>> as well. >>> >>> [Example 1], if a symbol is flagged as "internal" to the module where >>> the it is defined, the linkage need to be changed into "internal" >>> to the executable/lib being compiled. >>> >>> [Example 2], For compile-time constants, their initialized value >>> needs to to cloned to the partitions where it is referenced, >>> The rationale is to make the post-ipo passes to take advantage >>> of the initlized value to squeeeze some performance. >>> >>> In order to not bloat the code size, the cloned constant should >>> mark "don't emit". [end of eg2] >>> >>> Being able to precisely update symbols' attribute is not only >>> vital to correctness, it has significant impact to the the >>> performance as well. >>> >>> I have not yet taken a thorough investigation of this issue. My >>> rudimentary implementation is simply flag symbol "external" when its >>> use/def are separated in different module. I believe this is one >>> of the most difficult part of this work. I guess it is going to >>> take long time to become stable. >>> >>> 2) In order to compile each partition in each separate thread (see >>> Section 3.2), we have to put partitions in distinct LLVMContext. >>> >>> I could be wrong, but I don't find the code which is able to >>> perform function cloning across LLVMContext. >>> >>> My workaround in the patch is to perform function cloning in >>> one LLVMContext (but in different Module, of course), then >>> save the module to disk file, and load it to memory using a >>> new LLVMContext. >>> >>> It is bit circuitous and expensive. >> Do you plan to fix this? What are the issues that prevented function cloning across multiple LLVMContexts? >> >> Evan >> >> > We may fix it, I don't know for sure if it is a big gain at this moment. > > If issues is that, as far as I can tell, current code base dose not have functions support copying > IR across different LLVMContext. > > For example, when it copy an instruction from src to dest, > it check the "src", take a look of of its Type, and derive LLVMContext from the Type, and use > the same context for the dest. So, we need to change the existing code. > > From shuxin.llvm at gmail.com Tue Jul 16 10:50:03 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 10:50:03 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> Message-ID: <51E587CB.8000505@gmail.com> On 7/16/13 7:23 AM, Wan, Xiaofei wrote: > Yes, the purpose is similar, we started this job from last year; > But it Shuxin's solution is module based (correct me if I am wrong), we tried this solution and failed for many reasons, it is described in my design document > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing > > we need discuss two solution and compare them, then adopt one solution > > The biggest difference of module based parallelism and function based parallelism are > 1. how to partition module into different pieces which consume similar time, it is a difficult question Why difficult? > 2. How to make sure the generated binary is same each time It depends on what is the same. In the merged version, constant may keep one copy, while in the partition version, constant may be duplicated as the post-IPO passes may generated some constant, and they cannot share with the same constant generated in other partitions. All these issues don't sound to be a problem in practice. > 3. if 2 can't be achieved, it is difficult to validate the correctness of parallelism It is nothing about the correctness. From shuxin.llvm at gmail.com Tue Jul 16 11:02:52 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 11:02:52 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: <51E58ACC.1080706@gmail.com> In addition to the concerns Chandler figure out, I'm curious about : execution time of pristine-llc vs "modified-llc with -thd=1", and the exec-time of pristine-clang vs clang-linked-with-the-modified-llc. Thanks On 7/16/13 3:46 AM, Chandler Carruth wrote: > While I think the end goal you're describing is close to the correct > one, I see the high-level strategy for getting there somewhat > differently: > > 1) The code generators are only one collection of function passes that > might be parallelized. Many others might also be parallelized > profitably. The design for parallelism within LLVM's pass management > infrastructure should be sufficiently generic to express all of these > use cases. > > 2) The idea of having multiple pass managers necessitates (unless I > misunderstand) duplicating a fair amount of state. For example, the > caches in immutable analysis passes would no longer be shared, etc. I > think that is really unfortunate, and would prefer instead to use > parallelizing pass managers that are in fact responsible for the > scheduling of passes. > > 3) It doesn't provide a strategy for parallelizing the leaves of a > CGSCC pass manager which is where a significant portion of the > potential parallelism is available within the middle end. > > 4) It doesn't deal with the (numerous) parts of LLVM that are not > actually thread safe today. They may happen to work with the code > generators you're happening to test, but there is no guarantee. > Notable things to think about here are computing new types, the > use-def lists of globals, commandline flags, and static state > variables. While our intent has been to avoid problems with the last > two that could preclude parallelism, it seems unlikely that we have > succeeded without thorough testing to this point. Instead, I fear we > have leaned heavily on the crutch of one-thread-per-LLVMContext. > > 5) It adds more complexity onto the poorly designed pass manager > infrastructure. Personally, I think that cleanups to the design and > architecture of the pass manager should be prioritized above adding > new functionality like parallelism. However, so far no one has really > had time to do this (including myself). While I would like to have > time in the future to do this, as with everything else in OSS, it > won't be real until the patches start flowing. > > > On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei > wrote: > > Hi, community: > > For the sake of our business need, I want to enable > "Function-based parallel code generation" to boost up the > compilation of single module, please see the details of the design > and provide your feedbacks on below aspects, thanks! > 1. Is this idea the proper solution for my requirement > 2. This new feature will be enabled by llc -thd=N and has no > impact on original llc when -thd=1 > 3. Can this new feature of llc be accepted by community and merged > into LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one > patch could be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR > and link them into a big BC file; the big BC file is then compiled > into binary code on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened > on target device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there > is only one big BC file; function-based parallel LLVM backend code > generation is a good solution to improve compilation time which > will fully utilize multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output > 2. No impacts on single thread performance & conformance > 3. Little impacts on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by > "objdump -d", it could pass 10 hours stress test for all > performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever > for 4 threads > > > Thanks > Wan Xiaofei > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.malea at intel.com Tue Jul 16 11:04:57 2013 From: daniel.malea at intel.com (Malea, Daniel) Date: Tue, 16 Jul 2013 18:04:57 +0000 Subject: [LLVMdev] make lldb work In-Reply-To: Message-ID: Hi, I notice you're running a 32-bit program; are you also on a 32-bit host, or do you have a 64-bit OS installed? We don't generally test on 32-bit hosts, so it's possible you found a new bug. In addition, there are some known bugs with debugging 32-bit programs (even on 64-bit hosts) which will we hopefully be resolving soon. Nonetheless, I was unable to reproduce the behaviour you reported (with lldb-3.4 Ubuntu package version r186406). What's the output of "breakpoint list" -- does LLDB resolve any address for the breakpoint? Here is my LLDB session on a 64-bit host debugging a 32-bit program: daniel at lautrec:~$ lldb ./a.out Current executable set to './a.out' (i386). (lldb) breakpoint set -l 8 Breakpoint 1: where = a.out`main + 67 at bla.cpp:9, address = 0x080484f3 (lldb) breakpoint set -l 12 Breakpoint 2: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. (lldb) breakpoint list Current breakpoints: 1: file = '/home/daniel/bla.cpp', line = 8, locations = 1 1.1: where = a.out`main + 67 at bla.cpp:9, address = 0x080484f3, unresolved, hit count = 0 2: file = '/home/daniel/bla.cpp', line = 12, locations = 0 (pending) (lldb) process launch Process 22954 launched: './a.out' (i386) Process 22954 stopped * thread #1: tid = 0x59aa, 0x080484f3 a.out`main(argc=1, argv=0xffa37624) + 67 at bla.cpp:9, name = 'a.out, stop reason = breakpoint 1.1 frame #0: 0x080484f3 a.out`main(argc=1, argv=0xffa37624) + 67 at bla.cpp:9 6 while ( counter < 10 ) 7 counter++; 8 -> 9 printf("counter: %d\n", counter); 10 11 return 0; 12 } (lldb) From: 小刚 > Reply-To: "Maple.HL at gmail.com" > Date: Monday, 15 July, 2013 10:00 PM To: LLVM List > Subject: [LLVMdev] make lldb work Sorry if asked before. I'm new to LLDB, try to use it according to the lldb project site. I write some very simple code like: #include int main(int argc, char **argv) { int counter = 0; while ( counter < 10 ) counter++; printf("counter: %d\n", counter); return 0; } and the session like: $ clang -g main.c $ lldb-3.4 a.out (lldb) breakpoint set -l 8 ...... (lldb) breakpoint set -l 12 ...... (lldb) breakpoint list ...... (lldb) process launch Process 1105 launched: '/home/maple/debug/arena/a.out' (i386) counter: 10 Process 1105 exited with status = 0 (0x00000000) I checked with gdb, it works well. I'm not sure whether it's a bug or my false command. I'm using Ubuntu 12.04, and the lldb is from llvm.org/apt. It's svn186357. —— 美丽有两种: 一是深刻又动人的方程, 一是你泛着倦意淡淡的笑容。 From dblaikie at gmail.com Tue Jul 16 11:07:30 2013 From: dblaikie at gmail.com (David Blaikie) Date: Tue, 16 Jul 2013 11:07:30 -0700 Subject: [LLVMdev] General strategy to optimize LLVM IR In-Reply-To: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> References: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> Message-ID: On Tue, Jul 16, 2013 at 8:16 AM, Stéphane Letz wrote: > Hi, > > Our DSL emit sub-optimal LLVM IR that we optimize later on (LLVM IR ==> LLVM IR) before dynamically compiling it with the JIT. We would like to simply follow what clang/clang++ does when compiling with -O1/-O2/-O3 options. Our strategy up to now what to look at the opt.cpp code and take part of it in order to implement our optimization code. > > It appears to be rather difficult to follow evolution of the LLVM IR optimization strategies. With LLVM 3.3 our optimization code does not produce code as fast as the one produced with clang -03 anymore. Moreover the new vectorizations passes are still not working. > > It there a recommended way to add -O1/-O2/-O3 kind of optimizations on LLVM IR code? Any code to look at beside the opt.cpp tool? I'm not /entirely/ sure what you're asking. It sounds like you're asking "what passes should my compiler's -O1/2/3 flag's correspond to" and one answer to that is to look at Clang (I think Clang's is different from opt/llc's, maybe). > > Thanks. > > Stéphane Letz > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From peter at pcc.me.uk Tue Jul 16 11:10:29 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Tue, 16 Jul 2013 11:10:29 -0700 Subject: [LLVMdev] Special case list files; a bug and a slowness issue In-Reply-To: References: <20130713003835.GA15193@pcc.me.uk> Message-ID: <20130716181029.GA2054@pcc.me.uk> On Tue, Jul 16, 2013 at 01:23:30PM +0400, Alexey Samsonov wrote: > Do you want to avoid adding > anchors > for dfsan SpecialCaseList? No, I need the (documented) whole string semantics in dfsan. Thanks, -- Peter From shuxin.llvm at gmail.com Tue Jul 16 11:14:34 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 11:14:34 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <47D759DE-C3FB-4279-9BA4-0EE12C278701@apple.com> References: <51E087E2.5040101@gmail.com> <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> <51E584C7.9020502@gmail.com> <47D759DE-C3FB-4279-9BA4-0EE12C278701@apple.com> Message-ID: <51E58D8A.6060803@gmail.com> On 7/16/13 10:40 AM, Evan Cheng wrote: > On Jul 16, 2013, at 10:37 AM, Shuxin Yang wrote: > >> On 7/16/13 5:23 AM, Evan Cheng wrote: >>> Thanks for the proposal. This is important work which is one step towards making LTO more applicable for large applications. Some comments inline. >>> >>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: >>> >>>> 3.1.1 Figure out Partition scheme >>>> ---------------------------------- >>>> we randomly pick up some function and put them in a partition. >>>> It would be nice to perform some optimization at this moment. One opt >>>> in my mind is to reorder functions in order to reduce working-set and >>>> improve locality. >>>> >>>> Unfortunately, this opt seems to be bit blind at this time, because >>>> - CallGraph is not annotated with estimated or profiled frequency. >>>> - some linkers don't respect the order. It seems they just >>>> remembers the function order of the pristine input obj/fake-obj, >>>> and enforce this order at final link (link-exec/shared-lib) stage. >>>> >>>> Anyway, I try to ignore all these problems, and try to perform partition >>>> via following steps. Maybe we have some luck on some platforms: >>>> >>>> o. DFS the call-graph, ignoring the self-resursive edges, if freq is >>>> available, prioritizing the edges (i.e. corresponding to call-sites) >>>> such that frequent edges are visited first. >>>> >>>> o. Cut the DFS spanning tree obtained from the previous step bottom-up, >>>> Each cut/partition contains reasonable # of functions, and the aggregate >>>> size of the functions of the partition should not exceeds predefined >>>> threshold. >>> I'd like to see more details about this step. How do you determine the "reasonable # of functions"? How do you define the threshold? It has to be the same for a given target / platform regardless of the configuration of the build machine, right? >> Say, each module should not contains : >> - no more than 100 functions, and >> - the total size of the functions in a partition should not exceed the pre-defined threshold, >> >> These threshold can be override by command line. > But how do you come about the thresholds? And are they fixed thresholds or determined based on analysis of function size, etc.? Yes, just count the total # of instructions. I don't know which threshold is "comfortable" at this moment. We can keep changing it until we feel comfortable with it. From micah.villmow at smachines.com Tue Jul 16 11:34:32 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Tue, 16 Jul 2013 18:34:32 +0000 Subject: [LLVMdev] std::vector usage in Machine* data types Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE60070820A0@smi-exchange1.smi.local> Is there a specific reason why std::vector is used for MachineBasicBlock Pred/Succ and MachineInstr operands? It seems these would be the ideal case for using SmallVector. Thanks, Micah -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbytheway+llvm at gmail.com Mon Jul 15 19:50:38 2013 From: jbytheway+llvm at gmail.com (John Bytheway) Date: Mon, 15 Jul 2013 22:50:38 -0400 Subject: [LLVMdev] design for an accurate ODR-checker with clang In-Reply-To: References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> Message-ID: On 2013-07-15 18:20, Richard Smith wrote: > On Mon, Jul 15, 2013 at 3:12 PM, John McCall > > wrote: > > On Jul 11, 2013, at 6:13 PM, Nick Lewycky > > wrote: >> This is the right basic design, but I'm curious why you're >> suggesting that the payload should just be a hash instead of >> an arbitrary string. >> >> >> What are you suggesting goes into this string? > > The same sorts of things that you were planning on hashing, but > maybe not hashed. It's up to you; having a full string would let > you actually show a useful error message, but it definitely inflates > binary sizes. If you really think you can make this performant > enough to do on every load, I can see how the latter would be important. > > (Perhaps this also clarifies why we want a hash: an unhashed string > would contain as much entropy as the entirety of the source code...) Maybe you can't afford to store the unhashed data for everything, but what about an option to store it for particular function(s)/class(es). That way, once an ODR violation has been detected through the hash infrastructure the compilation/linking can be repeated with more data stored, and yield a decent error message about what exactly changed between the two definitions. John Bytheway From xiaofei.wan at intel.com Tue Jul 16 03:12:08 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Tue, 16 Jul 2013 10:12:08 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation Message-ID: <851E09B5CA368045827A32DA76E440AF0197189D@SHSMSX104.ccr.corp.intel.com> Hi, community: For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks 1. Is this idea the proper solution for my requirement 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 3. Can this new feature of llc be accepted by community and merged into LLVM code tree Patches The patch is divided into four separated parts, the all-in-one patch could be found here: http://llvm-reviews.chandlerc.com/D1152 Design https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing Function-based parallel LLVM backend code generation Wan Xiaofei (xiaofei.wan at intel.com) Background l Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. l Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. l Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. Overall design strategy and goal l Generate totally same binary as what single thread output l No impacts on single thread performance l Little impacts on LLVM code infrastructure Why not choose module-based parallelism l Module partition ü How to partition one module into several modules which will consume similar time; Compilation time is not decided by instructions number. Compilation time depends on instruction categories + instruction number + CFG + others ü We tried this solution and stopped after below obstacles. Global variables & functions may be used by other modules; each global variables & constants has a use list, use list has to be re-constructed during module partition. Functions and variables must be cloned out since they can't belong to two modules; this may be a big effort and a waste of memory, especially for big BC files. l Binaries merge ü Linking different binaries is needed after all modules are finished, usually linking is a time consuming activity. l Validation strategy ü To simplify the validation, generating same binary is the best solution (including symbols and function order). It is not easy to generate totally identical binaries even though it is correct in module-based parallelism. Symbol & temp variable mangling are done in different modules; it is difficult to ensure the symbols are same. ü The function order after linking may not be same as what it should be in one-module. l Potential benefit ü What is the benefit from module-based parallelism, can it bring more benefit than function-based parallelism? ü Module partition & module linkage are two extra overheads. ü Global variables should be cloned for several times, it is a big memory penalty. Design of function-based parallelism [cid:image001.png at 01CE824E.0C5C7950] Step 1: Make LLVM pass Reentrant l Function passes should be thread-safe since function-based parallelism is adopted. l UseList and ValueHandleList of 'Constant' class may be accessed by different functions; all operations on UseList and ValueHandleList should be locked. l LLVMContext will be shared by different functions; all accesses to LLVMContext should be locked. l For allocators which use default SlabAllocator (which is static), operation on these kinds of allocators should be locked. l Symbols in MCContext are accessed by different functions, it should be locked. Step 2: Multiple pass manager * The role of PM in LLVM code generator ü PassManager is top level pass manager, it contains all module level passes which are necessary to generate the binary code; In all module-level passes, function pass manager is the biggest module pass. ü PassManager will control all steps during the code generation; a function should walk through all passes contained in function pass manager to emit final binary code. ü Pass can't be shared by different function/thread simultaneously since pass contain many immediate information. * Multiple pass managers ü Multiple pass managers are created to implement parallel compilation, each pass manager is owned by one thread. ü During all passes/pass managers, there is one parent pass/pass manager which will delegate some activities for other passes/pass managers ü AsmPrinter is the last pass for function passes, it is shared by all threads; in this pass, parent AsmPrinter will delegate the code emission for other threads. Step 3: Share the last pass "AsmPrinter" * AsmPrinter is the last function pass which will emit the final binary code; it is shared by different functions/threads. * AsmPrinter is responsible for merging instructions generated by different threads. * AsmPrinter will provide mechanism to make sure the instructions sequences are same as what they are in single thread. Validation methodology & test result * Parallel llc will generate totally same binary code as single thread, similar validation to single thread can be used * Long time stress tests are launched to guarantee the correctness and robustness. Current status and test result * Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark * Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads Thanks Wan Xiaofei -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 202279 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.Threading.patch Type: application/octet-stream Size: 5225 bytes Desc: Parallel.CG.Threading.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.AsmPrinter.patch Type: application/octet-stream Size: 29239 bytes Desc: Parallel.CG.AsmPrinter.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.MultiplePM.patch Type: application/octet-stream Size: 21920 bytes Desc: Parallel.CG.MultiplePM.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.PassReentrant.patch Type: application/octet-stream Size: 56511 bytes Desc: Parallel.CG.PassReentrant.patch URL: From spop at codeaurora.org Tue Jul 16 11:42:33 2013 From: spop at codeaurora.org (Sebastian Pop) Date: Tue, 16 Jul 2013 13:42:33 -0500 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> Message-ID: <20130716184233.GA22630@codeaurora.org> Star Tan wrote: > I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. > Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. On top of your patch, I have removed from ScopDetection.cpp all printing of LLVM values, like this: - INVALID(AffFunc, "Non affine access function: " << *AccessFunction); + INVALID(AffFunc, "Non affine access function: "); there are a good dozen or so of these pretty printing. With these changes the compile time spent in ScopDetection drops dramatically to almost 0: here is the longest running one in the compilation of an Android stack: 2.1900 ( 13.7%) 0.0100 ( 7.7%) 2.2000 ( 13.6%) 2.2009 ( 13.4%) Polly - Detect static control parts (SCoPs) Before these changes, the top most expensive ScopDetection time used to be a few hundred of seconds. Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From stoklund at 2pi.dk Tue Jul 16 11:45:46 2013 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Tue, 16 Jul 2013 11:45:46 -0700 Subject: [LLVMdev] std::vector usage in Machine* data types In-Reply-To: <3947CD34E13C4F4AB2D94AD35AE3FE60070820A0@smi-exchange1.smi.local> References: <3947CD34E13C4F4AB2D94AD35AE3FE60070820A0@smi-exchange1.smi.local> Message-ID: <821C1DF0-A09D-4D0A-B33F-97881BC73B02@2pi.dk> On Jul 16, 2013, at 11:34 AM, Micah Villmow wrote: > Is there a specific reason why std::vector is used for MachineBasicBlock Pred/Succ and MachineInstr operands? > > It seems these would be the ideal case for using SmallVector. I agree. Patches welcome. Thanks, /jakob From shuxin.llvm at gmail.com Tue Jul 16 12:47:44 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 12:47:44 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E584C7.9020502@gmail.com> References: <51E087E2.5040101@gmail.com> <1B816499-0C43-475F-911E-0496FBAE145A@apple.com> <51E584C7.9020502@gmail.com> Message-ID: <51E5A360.8080504@gmail.com> I recall another reason for not adding the functionality about cloning functions across LLVMContexts. Currently, LTO merges all stuff together into a "merged module" before IPO and post-IPO starts. I believe this have to change in the future if we need to improve the scalarbility. As with other compilers, we only care "merged symbol table" and "merged call-graph" in some passes, and load some of the merged module on-demand. I guess function-cloning need some changes if we switch to that mode. So for now, we are better using whatever we have today, instead mess around implementing something that are about to change in the future. > 2) In order to compile each partition in each separate thread (see >>> Section 3.2), we have to put partitions in distinct LLVMContext. >>> >>> I could be wrong, but I don't find the code which is able to >>> perform function cloning across LLVMContext. >>> >>> My workaround in the patch is to perform function cloning in >>> one LLVMContext (but in different Module, of course), then >>> save the module to disk file, and load it to memory using a >>> new LLVMContext. >>> >>> It is bit circuitous and expensive. >> Do you plan to fix this? What are the issues that prevented function >> cloning across multiple LLVMContexts? >> >> Evan >> >> > We may fix it, I don't know for sure if it is a big gain at this moment. > > If issues is that, as far as I can tell, current code base dose not > have functions support copying > IR across different LLVMContext. > > For example, when it copy an instruction from src to dest, > it check the "src", take a look of of its Type, and derive LLVMContext > from the Type, and use > the same context for the dest. So, we need to change the existing code. > > From grosbach at apple.com Tue Jul 16 12:57:29 2013 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 16 Jul 2013 12:57:29 -0700 Subject: [LLVMdev] Operand constrain specification In-Reply-To: <397121722.12373772.1373992344390.JavaMail.root@alcf.anl.gov> References: <397121722.12373772.1373992344390.JavaMail.root@alcf.anl.gov> Message-ID: <860743FD-8984-4458-9D65-79F001060D54@apple.com> On Jul 16, 2013, at 9:32 AM, Hal Finkel wrote: > ----- Original Message ----- >> >> >> Hi, >> >> How can I specify in a .td file that source and destination should >> not use the same register? > > I think that you can use the EarlyClobber operand flag to achieve this (TableGen has an @earlyclobber constraint; there are some examples in the ARM backend). > Yes, that’s exactly what that constraint is for. -Jim > -Hal > >> >> Thanks. >> >> >> * Elena >> >> >> >> >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From xinliangli at gmail.com Tue Jul 16 13:18:12 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Tue, 16 Jul 2013 13:18:12 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei wrote: > Hi, community: > > For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! > 1. Is this idea the proper solution for my requirement > 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 > 3. Can this new feature of llc be accepted by community and merged into LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one patch could be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output > 2. No impacts on single thread performance & conformance > 3. Little impacts on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads Ignoring FE time which can be fully parallelized and assuming 10% compile time is spent in serial module passes, 25% time is spent in CGSCC pass, the maximum speed up that can be gained by using function level parallelism is less than 3x. Even adding support for parallel compilation for leaves of CG in CGSCC pass won't help too much -- the percentage of leaf functions is < 30% in large apps I have seen. Module based parallelism proposed by Shuxin has max speed up of 10x, assuming body cloning does not add a lot overhead and build farm with hundred/thousands of nodes is used. David > > > Thanks > Wan Xiaofei > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From chandlerc at google.com Tue Jul 16 13:33:13 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 16 Jul 2013 13:33:13 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li wrote: > Ignoring FE time which can be fully parallelized and assuming 10% > compile time is spent in serial module passes, 25% time is spent in > CGSCC pass, the maximum speed up that can be gained by using function > level parallelism is less than 3x. Even adding support for parallel > compilation for leaves of CG in CGSCC pass won't help too much -- the > percentage of leaf functions is < 30% in large apps I have seen. > Can you clarify what you're basing these assumption on or how you derived your data? > Module based parallelism proposed by Shuxin has max speed up of 10x, > assuming body cloning does not add a lot overhead and build farm with > hundred/thousands of nodes is used. > Body cloning does add some overhead, so that actually needs to be measured. Also, many don't have such a build farm. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xinliangli at gmail.com Tue Jul 16 13:35:02 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Tue, 16 Jul 2013 13:35:02 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> Message-ID: A third approach is to decouple the backend compilation and parallelism strategy from the partitioning. The partitioning can spits out partition BC files and some action records in some standard format. All of this can be fed into some driver tools that converts the compilation action file into make/build file of the underlying build system of your choice: 1) it can simply a compiler driver that does thread level parallelism; 2) or a tool that generates Makfiles which are fed into parallel make to explore single node parallelism; 3) or a tool that generates BUILD files that feed into distributed build system (such as Google's blaze: http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) Another benefit is it will make compiler debugging easier. thanks, David On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick wrote: > > On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > > 3.2 Compile partitions independently > -------------------------------------- > > There are two camps: one camp advocate compiling partitions via > multi-process, > the other one favor multi-thread. > > Inside Apple compiler teams, I'm the only one belong to the 1st comp. I > think > while multi-proc sounds bit red-neck, it has its advantage for this purpose, > and > while multi-thread is certainly more eye-popping, it has its advantage > as well. > > The advantage of multi-proc are: > 1) easier to implement, the process run in its own address space. > We don't need to worry about they can interfere with each other. > > 2)huge, or not unlimited, address space. > > The disadvantage is that it's expensive. But I guess the cost is > almost negligible compared to the overall IPO compilation. > > The advantage of multi-threads I can imagine are: > 1) sound fancy > 2) it is light-weight > 3) inter-thread communication is easier than IPC. > > Its disadvantage are: > 1). Oftentime we will come across race-condition, and it took > awful long time to figure it out. While the code is supposed > to be mult-thread safe, we might miss some tricky case. > Trouble-shooting race condition is a nightmare. > > 2) Small address space. This is big problem if we the compiler > is built 32-bit . In that case, the compiler is not able to bring > lots of stuff in memory even if the HW dose > provide ample mem. > > 3) The thread-safe run-time lib is more expensive. > I once linked a compiler using -lpthread (I dose not have to) on a > UNIX platform, and saw the compiler slow down by about 1/3. > > I'm not able to convince the folks in other camp, neither are they > able to convince me. I decide to implement both. Fortunately, this > part is not difficult, it seems to be rather easy to crank out one within > short > period of time. It would be interesting to compare them side-by-side, > and see which camp lose:-). On the other hand, if we run into race-condition > problem, we choose multi-proc version as a fall-back. > > > While I am a self-proclaimed multi-process red-neck, in this case I would > prefer to see a multi-threaded implementation because I want to verify that > LLVMContext can be used as advertised. I'm sure some extra care will be > needed to report failures/diagnostics, but we should start with the > assumption that this approach is not significantly harder than multi-process > because that's how we advertise the design. > > If any of the multi-threaded disadvantages you point out are real, I would > like to find out about it. > > 1. Race Conditions: We should be able to verify that the thread-parallel vs. > sequential or multi-process compilation generate the same result. If they > diverge, we would like to know about the bug so it can be fixed--independent > of LTO. > > 2. Small Address Space with LTO. We don't need to design around this > hypothetical case. > > 3. Expensive thread-safe runtime lib. We should not speculate that platforms > that we, as the LLVM community, care about have this problem. Let's assume > that our platforms are well implemented unless we have data to the contrary. > (Personally, I would even love to use TLS in the compiler to vastly simplify > API design in the backend, but I am not going to be popular for saying so). > > We should be able to decompose each step of compilation for debugging. So > the multi-process "implementation" should just be a degenerate form of > threading with a bit of driver magic if you want to automate it. > > -Andy > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From xinliangli at gmail.com Tue Jul 16 13:37:50 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Tue, 16 Jul 2013 13:37:50 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 1:33 PM, Chandler Carruth wrote: > On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li > wrote: >> >> Ignoring FE time which can be fully parallelized and assuming 10% >> compile time is spent in serial module passes, 25% time is spent in >> CGSCC pass, the maximum speed up that can be gained by using function >> level parallelism is less than 3x. Even adding support for parallel >> compilation for leaves of CG in CGSCC pass won't help too much -- the >> percentage of leaf functions is < 30% in large apps I have seen. > > > Can you clarify what you're basing these assumption on or how you derived > your data? > Those numbers are purely speculative -- does Clang has an option to dump the time breakout of each passes such as -ftime-report in GCC? thanks, David >> >> Module based parallelism proposed by Shuxin has max speed up of 10x, >> assuming body cloning does not add a lot overhead and build farm with >> hundred/thousands of nodes is used. > > > Body cloning does add some overhead, so that actually needs to be measured. > Also, many don't have such a build farm. From chandlerc at google.com Tue Jul 16 13:40:26 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 16 Jul 2013 13:40:26 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 1:37 PM, Xinliang David Li wrote: > On Tue, Jul 16, 2013 at 1:33 PM, Chandler Carruth > wrote: > > On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li > > > wrote: > >> > >> Ignoring FE time which can be fully parallelized and assuming 10% > >> compile time is spent in serial module passes, 25% time is spent in > >> CGSCC pass, the maximum speed up that can be gained by using function > >> level parallelism is less than 3x. Even adding support for parallel > >> compilation for leaves of CG in CGSCC pass won't help too much -- the > >> percentage of leaf functions is < 30% in large apps I have seen. > > > > > > Can you clarify what you're basing these assumption on or how you derived > > your data? > > > > Those numbers are purely speculative -- does Clang has an option to > dump the time breakout of each passes such as -ftime-report in GCC? > We have the functionality... I thought we wired -ftime-report up to it? If that doesn't work I'll have to go digging. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xinliangli at gmail.com Tue Jul 16 13:46:20 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Tue, 16 Jul 2013 13:46:20 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 1:40 PM, Chandler Carruth wrote: > On Tue, Jul 16, 2013 at 1:37 PM, Xinliang David Li > wrote: >> >> On Tue, Jul 16, 2013 at 1:33 PM, Chandler Carruth >> wrote: >> > On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li >> > >> > wrote: >> >> >> >> Ignoring FE time which can be fully parallelized and assuming 10% >> >> compile time is spent in serial module passes, 25% time is spent in >> >> CGSCC pass, the maximum speed up that can be gained by using function >> >> level parallelism is less than 3x. Even adding support for parallel >> >> compilation for leaves of CG in CGSCC pass won't help too much -- the >> >> percentage of leaf functions is < 30% in large apps I have seen. >> > >> > >> > Can you clarify what you're basing these assumption on or how you >> > derived >> > your data? >> > >> >> Those numbers are purely speculative -- does Clang has an option to >> dump the time breakout of each passes such as -ftime-report in GCC? > > > We have the functionality... I thought we wired -ftime-report up to it? If > that doesn't work I'll have to go digging. I just checked -- it produces a flat report -- it would be nice to produce something similar to -debug-pass=Structure outputs. David From chandlerc at google.com Tue Jul 16 13:48:07 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 16 Jul 2013 13:48:07 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: On Tue, Jul 16, 2013 at 1:46 PM, Xinliang David Li wrote: > On Tue, Jul 16, 2013 at 1:40 PM, Chandler Carruth > wrote: > > On Tue, Jul 16, 2013 at 1:37 PM, Xinliang David Li > > > wrote: > >> > >> On Tue, Jul 16, 2013 at 1:33 PM, Chandler Carruth > > >> wrote: > >> > On Tue, Jul 16, 2013 at 1:18 PM, Xinliang David Li > >> > > >> > wrote: > >> >> > >> >> Ignoring FE time which can be fully parallelized and assuming 10% > >> >> compile time is spent in serial module passes, 25% time is spent in > >> >> CGSCC pass, the maximum speed up that can be gained by using function > >> >> level parallelism is less than 3x. Even adding support for parallel > >> >> compilation for leaves of CG in CGSCC pass won't help too much -- the > >> >> percentage of leaf functions is < 30% in large apps I have seen. > >> > > >> > > >> > Can you clarify what you're basing these assumption on or how you > >> > derived > >> > your data? > >> > > >> > >> Those numbers are purely speculative -- does Clang has an option to > >> dump the time breakout of each passes such as -ftime-report in GCC? > > > > > > We have the functionality... I thought we wired -ftime-report up to it? > If > > that doesn't work I'll have to go digging. > > I just checked -- it produces a flat report -- it would be nice to > produce something similar to -debug-pass=Structure outputs. Yea, improving the nested timing information and other improvements to reflect the pass management structure are on my list for the significant changes to the pass manager infrastructure I mentioned in my original comments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Tue Jul 16 13:49:30 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 13:49:30 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> Message-ID: <51E5B1DA.10301@gmail.com> I have actually came up the 3 approaches to build the post-ipo object independently. The "3rd approach" here is the 1st solution in my original proposal. Almost all coworkers call it sucks:-) Now I accept it because the it has no way to be adaptive. Consider the scenario we compile the llvm compiler. We use "make -j16" for computer with 8 processor, each make-thread invoke a compiler which may blindly invoke 16 threads! So, we end up to have 16*16 threads. Being adaptive will render it possible to pick up right factor judiciously and adpatively. In any case, I will support this approach (i.e. the 3rd approach you mentioned) at very least at beginning. On 7/16/13 1:35 PM, Xinliang David Li wrote: > A third approach is to decouple the backend compilation and > parallelism strategy from the partitioning. The partitioning can > spits out partition BC files and some action records in some standard > format. All of this can be fed into some driver tools that converts > the compilation action file into make/build file of the underlying > build system of your choice: > > 1) it can simply a compiler driver that does thread level parallelism; > 2) or a tool that generates Makfiles which are fed into parallel make > to explore single node parallelism; > 3) or a tool that generates BUILD files that feed into distributed > build system (such as Google's blaze: > http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) > > Another benefit is it will make compiler debugging easier. > > thanks, > > David > > On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick wrote: >> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: >> >> 3.2 Compile partitions independently >> -------------------------------------- >> >> There are two camps: one camp advocate compiling partitions via >> multi-process, >> the other one favor multi-thread. >> >> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I >> think >> while multi-proc sounds bit red-neck, it has its advantage for this purpose, >> and >> while multi-thread is certainly more eye-popping, it has its advantage >> as well. >> >> The advantage of multi-proc are: >> 1) easier to implement, the process run in its own address space. >> We don't need to worry about they can interfere with each other. >> >> 2)huge, or not unlimited, address space. >> >> The disadvantage is that it's expensive. But I guess the cost is >> almost negligible compared to the overall IPO compilation. >> >> The advantage of multi-threads I can imagine are: >> 1) sound fancy >> 2) it is light-weight >> 3) inter-thread communication is easier than IPC. >> >> Its disadvantage are: >> 1). Oftentime we will come across race-condition, and it took >> awful long time to figure it out. While the code is supposed >> to be mult-thread safe, we might miss some tricky case. >> Trouble-shooting race condition is a nightmare. >> >> 2) Small address space. This is big problem if we the compiler >> is built 32-bit . In that case, the compiler is not able to bring >> lots of stuff in memory even if the HW dose >> provide ample mem. >> >> 3) The thread-safe run-time lib is more expensive. >> I once linked a compiler using -lpthread (I dose not have to) on a >> UNIX platform, and saw the compiler slow down by about 1/3. >> >> I'm not able to convince the folks in other camp, neither are they >> able to convince me. I decide to implement both. Fortunately, this >> part is not difficult, it seems to be rather easy to crank out one within >> short >> period of time. It would be interesting to compare them side-by-side, >> and see which camp lose:-). On the other hand, if we run into race-condition >> problem, we choose multi-proc version as a fall-back. >> >> >> While I am a self-proclaimed multi-process red-neck, in this case I would >> prefer to see a multi-threaded implementation because I want to verify that >> LLVMContext can be used as advertised. I'm sure some extra care will be >> needed to report failures/diagnostics, but we should start with the >> assumption that this approach is not significantly harder than multi-process >> because that's how we advertise the design. >> >> If any of the multi-threaded disadvantages you point out are real, I would >> like to find out about it. >> >> 1. Race Conditions: We should be able to verify that the thread-parallel vs. >> sequential or multi-process compilation generate the same result. If they >> diverge, we would like to know about the bug so it can be fixed--independent >> of LTO. >> >> 2. Small Address Space with LTO. We don't need to design around this >> hypothetical case. >> >> 3. Expensive thread-safe runtime lib. We should not speculate that platforms >> that we, as the LLVM community, care about have this problem. Let's assume >> that our platforms are well implemented unless we have data to the contrary. >> (Personally, I would even love to use TLS in the compiler to vastly simplify >> API design in the backend, but I am not going to be popular for saying so). >> >> We should be able to decompose each step of compilation for debugging. So >> the multi-process "implementation" should just be a degenerate form of >> threading with a bit of driver magic if you want to automate it. >> >> -Andy >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> From xinliangli at gmail.com Tue Jul 16 14:04:45 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Tue, 16 Jul 2013 14:04:45 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E5B1DA.10301@gmail.com> References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> <51E5B1DA.10301@gmail.com> Message-ID: On Tue, Jul 16, 2013 at 1:49 PM, Shuxin Yang wrote: > I have actually came up the 3 approaches to build the post-ipo object > independently. > > The "3rd approach" here is the 1st solution in my original proposal. Almost > all coworkers call it sucks:-) > Now I accept it because the it has no way to be adaptive. > > Consider the scenario we compile the llvm compiler. We use "make -j16" for > computer with 8 processor, each make-thread invoke a compiler which may > blindly invoke 16 threads! > So, we end up to have 16*16 threads. > Determining the right parallelism is not the job of the compiler (builtin) nor that of a developer -- the underlying build system should take care of the scheduling :) David > Being adaptive will render it possible to pick up right factor judiciously > and adpatively. > > In any case, I will support this approach (i.e. the 3rd approach you > mentioned) at very least at beginning. > > > > On 7/16/13 1:35 PM, Xinliang David Li wrote: >> >> A third approach is to decouple the backend compilation and >> parallelism strategy from the partitioning. The partitioning can >> spits out partition BC files and some action records in some standard >> format. All of this can be fed into some driver tools that converts >> the compilation action file into make/build file of the underlying >> build system of your choice: >> >> 1) it can simply a compiler driver that does thread level parallelism; >> 2) or a tool that generates Makfiles which are fed into parallel make >> to explore single node parallelism; >> 3) or a tool that generates BUILD files that feed into distributed >> build system (such as Google's blaze: >> >> http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) >> >> Another benefit is it will make compiler debugging easier. >> >> thanks, >> >> David >> >> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick wrote: >>> >>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: >>> >>> 3.2 Compile partitions independently >>> -------------------------------------- >>> >>> There are two camps: one camp advocate compiling partitions via >>> multi-process, >>> the other one favor multi-thread. >>> >>> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I >>> think >>> while multi-proc sounds bit red-neck, it has its advantage for this >>> purpose, >>> and >>> while multi-thread is certainly more eye-popping, it has its advantage >>> as well. >>> >>> The advantage of multi-proc are: >>> 1) easier to implement, the process run in its own address space. >>> We don't need to worry about they can interfere with each other. >>> >>> 2)huge, or not unlimited, address space. >>> >>> The disadvantage is that it's expensive. But I guess the cost is >>> almost negligible compared to the overall IPO compilation. >>> >>> The advantage of multi-threads I can imagine are: >>> 1) sound fancy >>> 2) it is light-weight >>> 3) inter-thread communication is easier than IPC. >>> >>> Its disadvantage are: >>> 1). Oftentime we will come across race-condition, and it took >>> awful long time to figure it out. While the code is supposed >>> to be mult-thread safe, we might miss some tricky case. >>> Trouble-shooting race condition is a nightmare. >>> >>> 2) Small address space. This is big problem if we the compiler >>> is built 32-bit . In that case, the compiler is not able to bring >>> lots of stuff in memory even if the HW dose >>> provide ample mem. >>> >>> 3) The thread-safe run-time lib is more expensive. >>> I once linked a compiler using -lpthread (I dose not have to) on a >>> UNIX platform, and saw the compiler slow down by about 1/3. >>> >>> I'm not able to convince the folks in other camp, neither are they >>> able to convince me. I decide to implement both. Fortunately, this >>> part is not difficult, it seems to be rather easy to crank out one within >>> short >>> period of time. It would be interesting to compare them side-by-side, >>> and see which camp lose:-). On the other hand, if we run into >>> race-condition >>> problem, we choose multi-proc version as a fall-back. >>> >>> >>> While I am a self-proclaimed multi-process red-neck, in this case I would >>> prefer to see a multi-threaded implementation because I want to verify >>> that >>> LLVMContext can be used as advertised. I'm sure some extra care will be >>> needed to report failures/diagnostics, but we should start with the >>> assumption that this approach is not significantly harder than >>> multi-process >>> because that's how we advertise the design. >>> >>> If any of the multi-threaded disadvantages you point out are real, I >>> would >>> like to find out about it. >>> >>> 1. Race Conditions: We should be able to verify that the thread-parallel >>> vs. >>> sequential or multi-process compilation generate the same result. If they >>> diverge, we would like to know about the bug so it can be >>> fixed--independent >>> of LTO. >>> >>> 2. Small Address Space with LTO. We don't need to design around this >>> hypothetical case. >>> >>> 3. Expensive thread-safe runtime lib. We should not speculate that >>> platforms >>> that we, as the LLVM community, care about have this problem. Let's >>> assume >>> that our platforms are well implemented unless we have data to the >>> contrary. >>> (Personally, I would even love to use TLS in the compiler to vastly >>> simplify >>> API design in the backend, but I am not going to be popular for saying >>> so). >>> >>> We should be able to decompose each step of compilation for debugging. So >>> the multi-process "implementation" should just be a degenerate form of >>> threading with a bit of driver magic if you want to automate it. >>> >>> -Andy >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> > From shuxin.llvm at gmail.com Tue Jul 16 14:10:19 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 14:10:19 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> <344919F0-3ED1-4EED-865D-8AC273821A3B@apple.com> <51E5B1DA.10301@gmail.com> Message-ID: <51E5B6BB.60707@gmail.com> On 7/16/13 2:04 PM, Xinliang David Li wrote: > On Tue, Jul 16, 2013 at 1:49 PM, Shuxin Yang wrote: >> I have actually came up the 3 approaches to build the post-ipo object >> independently. >> >> The "3rd approach" here is the 1st solution in my original proposal. Almost >> all coworkers call it sucks:-) >> Now I accept it because the it has no way to be adaptive. >> >> Consider the scenario we compile the llvm compiler. We use "make -j16" for >> computer with 8 processor, each make-thread invoke a compiler which may >> blindly invoke 16 threads! >> So, we end up to have 16*16 threads. >> > Determining the right parallelism is not the job of the compiler > (builtin) nor that of a developer -- the underlying build system > should take care of the scheduling :) > > David > > > People told me we have some magic lib which is able to figure out how heavy the system is. Compiler just need to link against them. I have not yet got chance to try these libs, neither do I know their names at this moment. Since how to compiler post-LTO is small part of this project, I'd like to pick up whichever easier first. We may (very likely) end up all these possible three approaches. From rkotler at mips.com Tue Jul 16 14:10:31 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 14:10:31 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> Message-ID: <51E5B6C7.6000404@mips.com> On 07/16/2013 05:21 AM, Tilmann Scheller wrote: > Hi Reed, > > I’ve used Eclipse for a long time to do LLVM development on Linux (both for code navigation/editing and debugging), any recent Linux distribution and version of Eclipse should be fine (even older versions should be good enough as this has been working for many years). > > Xcode works fine as well, I started to use Xcode exclusively when I switched to OS X. > > The key to make this work is to use CMake to generate project files for Eclipse/Xcode, you can do this by specifying the appropriate generator on the command line e.g. -G Xcode or -G "Eclipse CDT4 - Unix Makefiles”. Then you can just open the generated project file. Mind you, the generated projects are kind of ugly e.g. the Xcode project has like more than 200 targets but apart from that they are working fine. > > In terms of key bindings both Eclipse and Xcode ship with Emacs key bindings and there are plugins which allow you to use vim key bindings as well. With Eclipse I’ve been using the Viable plugin for that and for Xcode there is Xvim. > > Hope this helps :) > > Regards, > > Tilmann > The source browsing is way better this way. This following pointer may be useful to others to complete the importing of the project. http://www.vtk.org/Wiki/Eclipse_CDT4_Generator How are you setting up the debugger? For example, if you want to run from clang but debug the back end code generation ? Which process launcher? Protocol == mi? BTW: do you do builds inside of eclipse. Seems to be kind of slow. Tia. Reed > On Jul 16, 2013, at 1:05 PM, reed kotler wrote: > >> Is anyone using Eclipse and gdb to debug llvm/clang? >> If so, which version of Eclipse, gdb and linux flavor. >> >> I just use gdb currently. >> >> I'm going to try using my mac also. >> Is anyone using xcode/lldb to debug llvm/clang? >> >> Tia. >> >> Reed >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From rkotler at mips.com Tue Jul 16 15:08:42 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 15:08:42 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5B6C7.6000404@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5B6C7.6000404@mips.com> Message-ID: <51E5C46A.4050902@mips.com> I made wiki pages on this. https://dmz-portal.mips.com/wiki/Building_with_Cmake_to_create_an_Eclipse_Project https://dmz-portal.mips.com/wiki/Importing_the_Project_into_Eclipse On 07/16/2013 02:10 PM, Reed Kotler wrote: > On 07/16/2013 05:21 AM, Tilmann Scheller wrote: >> Hi Reed, >> >> I’ve used Eclipse for a long time to do LLVM development on Linux >> (both for code navigation/editing and debugging), any recent Linux >> distribution and version of Eclipse should be fine (even older >> versions should be good enough as this has been working for many years). >> >> Xcode works fine as well, I started to use Xcode exclusively when I >> switched to OS X. >> >> The key to make this work is to use CMake to generate project files >> for Eclipse/Xcode, you can do this by specifying the appropriate >> generator on the command line e.g. -G Xcode or -G "Eclipse CDT4 - Unix >> Makefiles”. Then you can just open the generated project file. Mind >> you, the generated projects are kind of ugly e.g. the Xcode project >> has like more than 200 targets but apart from that they are working fine. >> >> In terms of key bindings both Eclipse and Xcode ship with Emacs key >> bindings and there are plugins which allow you to use vim key bindings >> as well. With Eclipse I’ve been using the Viable plugin for that and >> for Xcode there is Xvim. >> >> Hope this helps :) >> >> Regards, >> >> Tilmann >> > > The source browsing is way better this way. > > This following pointer may be useful to others to complete the importing > of the project. > > http://www.vtk.org/Wiki/Eclipse_CDT4_Generator > > How are you setting up the debugger? > > For example, if you want to run from clang but debug the back end code > generation ? > > Which process launcher? > > Protocol == mi? > > BTW: do you do builds inside of eclipse. > Seems to be kind of slow. > > Tia. > > Reed > >> On Jul 16, 2013, at 1:05 PM, reed kotler wrote: >> >>> Is anyone using Eclipse and gdb to debug llvm/clang? >>> If so, which version of Eclipse, gdb and linux flavor. >>> >>> I just use gdb currently. >>> >>> I'm going to try using my mac also. >>> Is anyone using xcode/lldb to debug llvm/clang? >>> >>> Tia. >>> >>> Reed >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nlewycky at google.com Tue Jul 16 15:32:31 2013 From: nlewycky at google.com (Nick Lewycky) Date: Tue, 16 Jul 2013 15:32:31 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E087E2.5040101@gmail.com> References: <51E087E2.5040101@gmail.com> Message-ID: On 12 July 2013 15:49, Shuxin Yang wrote: > Hi, There: > > This is the proposal for parallelizing post-ipo stage. See the following > for details. > > I also attach a toy-grade rudimentary implementation. This > implementation can be > used to illustrate some concepts here. This patch is not going to be > committed. > > Unfortunately, this weekend I will be too busy to read emails. Please do > not construe > delayed response as being rude :-). > > Thanks a lot in advance for your time insightful comments! > > Shuxin > > > The proposal > ------------ > It is organized as following: > 1) background info, if you heard "/usr/bin/ls", please skip it > 2) the motivation of parallelize post-IPO stage > 3) how to parallelize post-IPO > 4) the linker problems. > 5) the toy-grade rudimentary implementation > 6) misc > > 1.Some background > ------------------ > > The Interprocedural-optimization compilation, aka IPO or IPA, typically > consists of three stages: > > S1) pre-IPO > Each function goes through some analysis and not-very-aggressive > optimizations. > Some information is collected during this stage, this info will be to > IPO stages. > This info is usually called summary info. > > The result of this stage is "fake-objects" which is binary files using > some known object format to encapsulate IR as well as summary info > along with > other stuff. > > S2) IPO: > Compiler works with linker to resolve and merge symbols in the > "fake-objects" > > Then Interprocedural analyses (IPA) are invoked to perform > interprocedural > analysis either based on the summary-info, or directly on the IR. > > Interprocedural optimizations (IPO) are called based on the IPA result. > > In some compilers, IPA and IPO are separated. One reason is that many > IPAs can > be directly conduct on the concise summary info, while many IPOs need > to load > IRs and bulky annotation/metadata into memory. > > S3) post-IPO: > Typically consist of Loop-Nest-Opt, Scalar Opt, Code-Gen etc etc. > While they > are intra-procedural analyses/optimizers, they may directly benefit > from > the info collected in the IPO stages and pass down the road. > > LLVM collectively call S2 and S3 as "LTO CodeGen", which is very > confusing. > > 2. Why parallelize post-IPO stage > ============================== > > R1) To improve the scalarbility > It is quite obvious that we are not able to put everything about a > monster > program in memory at once. > > Even if you can afford a expensive computer, the address space of a > single compiler process cannot accommodate a monster program. > > R2) to take advantage of ample HW resource to shorten compile time. > R3) make debugging lot easier. > One can triage problems in a much smaller partition rather than > the huge monster program. > > This proposal is not able to shoot the goal R1 at this moment, because > during > the IPO stage, currently the compiler brings everything into memory at > once. > > 3. How to parallelize post-IPO stage > ==============================**====== > > From 5k' high, the concept is very simple, just to > step 1).divide the merged IR into small pieces, > step 2).and compile each of this pieces independendly. > step 3) the objects of each piece are fed back to linker to are linked > into an executable, or a dynamic lib. > > Section 3.1 through 3.3 describe these three steps respectively. > Yes, this is one approach. I think others at Google have looked into this sort of partitioning with GCC and found that the one thing which really helps you pick the partitions, is to profile the program and make the partitions based on the actual paths of functions seen by the profile. I don't think they saw much improvement without that. See http://research.google.com/pubs/pub36355.html . I do want to mention some other things we can do to parallelize. You may already know of these and have considered them and rejected them before deciding on the design you emailed out here, but I want to list them since there's already another thread with a different approach on the mailing list. * Use what we have. LTO partitions at a time, directories for instance, on the premise that LTO'ing them will produce something smaller than the sum of its inputs. Then when you do the final whole-program step, it will be receiving smaller inputs than it otherwise would. The change to LLVM here is to fix the inliner (or other optimizations?) to reduce the chance that LTO produces an output larger than its input. * Parallelize the per-function code generator within a single LLVMContext. CodeGen presently operates on a per-function basis, and is structured as an analysis over llvm IR. There shouldn't be any global state, and there shouldn't be any need for locking accesses to the IR since nothing will be mutating it. This will even speed up clang -O0, and is a solid first step that gets a thread-creation API into LLVM. - What happened to "one LLVMContext per thread"? Okay, that rule is a little white lie. Always was. LLVMContext allows two libraries to use llvm under the hood without interfering with each other (even to the point of separate maps of types to avoid one library from causing slow type lookups in the other). LLVM also doesn't have locking around accesses to the IR, and few guarantees how many things a single mutating operation will need to look at or change, but with those caveats in mind it is possible to share a context across threads. Because CodeGen is structured as an analysis over the IR without mutating the IR, it should work. There's probably still global state in the code generator somewhere, but it's not a structural problem. * Parallelize the function passes, and SCCs that are siblings in the call tree (again within a single LLVMContext). The gnarly part of this is that globals have shared use-lists which are updated as we modify each function individually. Those globals either need to have locks on their use-lists, replaced with a lockless list, or removed entirely. Probably both, as GlobalVariable's have use-lists we actually use in the optimizers, but we don't actually need the use-list for "i32 0". * Parallelize ModulePasses by splitting them into an analysis phase and an optimization phase. Make each per-TU build emit the .bc as usual plus an analysis-file (for instance, call graph, or "which functions reads/modify which globals"). Merge all the analysis-files and farm them back out to be used as input to the programs optimizing each .bc individually -- but now they have total knowledge of the whole-program call graph and other AA information, etc. - You can combine this with an RPC layer to give each worker the ability to ask for the definition of a function from another worker. LLVM already supports "unmaterialized" functions where the definition is loaded lazily. The analysis part should arrange to give us enough information to determine whether we want to do the inlining, then if we decide to materialize the function we get its body from another worker. * Parallelize by splitting into different LLVMContexts. This spares us the difficulty of adding locks or otherwise changing the internals of LLVM, gets us the ability to spread the load across multiple machines, and if combined with the RPC idea above you can also get good inlining without necessarily loading the whole program into memory on a single machine. I'm not planning to do the work any time soon so count my vote with that in mind, but if you ask me I think the first step should be to parallelize the backend within a single LLVMContext first, then to parallelize the function passes and CGSCC passes (across siblings only of course) second. Removing the use-list from simple constants is a very interesting thing to do to decrease lock contention, but we may want to do something smarter than just remove it -- consider emitting a large constant that is only used by an inline function. It is possible to emit a table of constants in the same COMDAT group as the function, then if the inline function is discarded by the linker the constants are discarded with it. I don't have a concrete proposal for that. Nick 3.1. Partitioning > ----------------- > Partitioning is to cut a resonabely-sized chunk from the big merged IRs. > It roughly consists of two steps, 1) determine the partition scheme, which > is relatively easy step, and 2) physically scoop the partition out of > the merged IR, which is much more involved. > > 3.1.1 Figure out Partition scheme > ------------------------------**---- > we randomly pick up some function and put them in a partition. > It would be nice to perform some optimization at this moment. One opt > in my mind is to reorder functions in order to reduce working-set and > improve locality. > > Unfortunately, this opt seems to be bit blind at this time, because > - CallGraph is not annotated with estimated or profiled frequency. > - some linkers don't respect the order. It seems they just > remembers the function order of the pristine input obj/fake-obj, > and enforce this order at final link (link-exec/shared-lib) stage. > > Anyway, I try to ignore all these problems, and try to perform partition > via following steps. Maybe we have some luck on some platforms: > > o. DFS the call-graph, ignoring the self-resursive edges, if freq is > available, prioritizing the edges (i.e. corresponding to call-sites) > such that frequent edges are visited first. > > o. Cut the DFS spanning tree obtained from the previous step bottom-up, > Each cut/partition contains reasonable # of functions, and the > aggregate > size of the functions of the partition should not exceeds predefined > threshold. > > o. repeat the previous step until the Call-graph's DFS spanning tree > is empty. > > 3.1.2 Partition transformation > ------------------------------ > > This is bit involved. There are bunch of problems we have to tackle. > 1) When the use/def of a symbol are separated in different modules, > its attribute, like linkage, visibility, need to be changed > as well. > > [Example 1], if a symbol is flagged as "internal" to the module where > the it is defined, the linkage need to be changed into "internal" > to the executable/lib being compiled. > > [Example 2], For compile-time constants, their initialized value > needs to to cloned to the partitions where it is referenced, > The rationale is to make the post-ipo passes to take advantage > of the initlized value to squeeeze some performance. > > In order to not bloat the code size, the cloned constant should > mark "don't emit". [end of eg2] > > Being able to precisely update symbols' attribute is not only > vital to correctness, it has significant impact to the the > performance as well. > > I have not yet taken a thorough investigation of this issue. My > rudimentary implementation is simply flag symbol "external" when its > use/def are separated in different module. I believe this is one > of the most difficult part of this work. I guess it is going to > take long time to become stable. > > 2) In order to compile each partition in each separate thread (see > Section 3.2), we have to put partitions in distinct LLVMContext. > > I could be wrong, but I don't find the code which is able to > perform function cloning across LLVMContext. > > My workaround in the patch is to perform function cloning in > one LLVMContext (but in different Module, of course), then > save the module to disk file, and load it to memory using a > new LLVMContext. > > It is bit circuitous and expensive. > > One random observation: > Currently, function-scoped static variables are considered > as "global variables". When cloning a function with static variable, > compiler has no idea if the static variables are used only by > the function being cloned, and hence separate the function > and the variables. > > I guess it would be nice if we organized symbols by its scope > instead of its live-time. it would be convenient for this situation. > > 3.2 Compile partitions independently > ------------------------------**-------- > > There are two camps: one camp advocate compiling partitions via > multi-process, > the other one favor multi-thread. > > Inside Apple compiler teams, I'm the only one belong to the 1st comp. I > think > while multi-proc sounds bit red-neck, it has its advantage for this > purpose, and > while multi-thread is certainly more eye-popping, it has its advantage > as well. > > The advantage of multi-proc are: > 1) easier to implement, the process run in its own address space. > We don't need to worry about they can interfere with each other. > > 2)huge, or not unlimited, address space. > > The disadvantage is that it's expensive. But I guess the cost is > almost negligible compared to the overall IPO compilation. > > The advantage of multi-threads I can imagine are: > 1) sound fancy > 2) it is light-weight > 3) inter-thread communication is easier than IPC. > > Its disadvantage are: > 1). Oftentime we will come across race-condition, and it took > awful long time to figure it out. While the code is supposed > to be mult-thread safe, we might miss some tricky case. > Trouble-shooting race condition is a nightmare. > > 2) Small address space. This is big problem if we the compiler > is built 32-bit . In that case, the compiler is not able to bring > lots of stuff in memory even if the HW dose > provide ample mem. > > 3) The thread-safe run-time lib is more expensive. > I once linked a compiler using -lpthread (I dose not have to) on a > UNIX platform, and saw the compiler slow down by about 1/3. > > I'm not able to convince the folks in other camp, neither are they > able to convince me. I decide to implement both. Fortunately, this > part is not difficult, it seems to be rather easy to crank out one within > short > period of time. It would be interesting to compare them side-by-side, > and see which camp lose:-). On the other hand, if we run into > race-condition > problem, we choose multi-proc version as a fall-back. > > Regardless which tech is going to use to compile partition > independently, in order to judiciously and adaptively choose appropriate > parallel-factor, the compiler certainly need a lib which is able to > figure out the load the entire system is in. I don't know if there are > such magic lib or not. > > 4. the tale of two kinds of linker > ------------------------------**---- > > As far as I can tell, llvm suports two kind linker for its IPO > compilation, > and the supports are embodied by two set of APIs/interfaces. > > o. Interface 1, those stuff named lto_xxxx(). > o. GNU gold interface, > The compiler interact with GNU gold via the adapter implemented > in tools/gold/gold-plugin.cpp. > > This adpater calls the interface-1 to control the IPO process. > It dose not have to call the interface APIs, I think it is definitely > ok it call internal functions. > > The compiler used to generate a single object file from the merged > IR, now it will generate multiple of them, one for each partition. > > So, the interface 1 is *NOT* sufficient any more. > > For gold linker users, it is easy to make them happy just by > hacking the adapter, informing the linker new input object > files. This is done transparently, the users don't need to install new ld. > > For those system which invoke ld interacting with the libLTO.{so,dylib}, > it has to accept the new APIs I added to the interface-1 in order to > enable the new functionality. Or maybe we can invoke '/the/path/to/ld -r > *.o -o merged.o' > and feed the merged.o the linker (this will keep the interface > interact)? Unfortunately, it dose not work at all, how can I know the path > the ld? the libLTO.{so,dydlib} is invoked as plugin; it cannot see the > argv. > How about hack them by adding a nasty flag pointing to the right ld? > Well, it works. However, I don't believe many people like to do it this > way, > that means I loose huge number of "QA" who are working hard for this > compiler. > > What's wrong with the interface-1? The ld side is more active than > the compiler side, however, in concept the IPO is driven by the compiler > side. > This mean this interface is changing over time. > > In contrast, the gold interface (as I rever-engineer from the adpator > code) is more symbol-centric, taking little IPO-thing into account. > That interface is simple and stable. > > 5. the rudimentary implementation > ------------------------------**--- > > I make it works for bzip2 at cpu2kint yesterday. bzip2 is "tiny" > program, I intentionally lower the partition size to get 3 partitions. > There is no comment in the code, and it definitely need rewrite. I > just check the correctness (with ref input), and I don't measure how much > it degrade the performance. (due to the problem I have not got chance > to tackle, see section 3.1.2, the symbol attribute stuff). > > The control flow basically is: > 1. add a module pass to the IPO pass-manager, and figure > out the partition scheme. > > 2) physically partition the merged partition. > the IR and the obj of partition are placed in a new dir. "llvmipo" > by default > > -- > ls llvmipo/ > Makefile merged part1.bc part1.o part2.bc part2.o > part3.bc part3.o > -- > > 3) For demo purpose, I drive the post-IPO stage via a makefile, which > encapsulate > hack and other nasty stuff. > > NOTE that the post-ipo pass in my hack contains only CodeGen pass, > we need to > reorganize the PassManagerBuilder::**populateLTOPassManager(), which > intermingle > IPO pass along with intra-proc scalar pass, we need to separate them > and the intra-proc > scalar pass to post-IPO stage. > > > 1 .PHONY = all > 2 > 3 > 4 BC = part1.bc part2.bc part3.bc > 5 OBJ = ${BC:.bc=.o} > 6 > 7 all : merged > 8 %.o : %.bc > 9 $(HOME)/tmp/lto.llc -filetype=obj $+ -o $@ > 10 > 11 merged : $(OBJ) > 12 /usr/bin/ld $+ -r -o $@ > 13 > > 4. as the Makefile sugguest, the *.o of the partions are linked into a > single obj "merged" > and feed back to link. > > > 6) Miscellaneous > =========== > Will partitioning degrade performance in theory. I think it depends on > the definition of > performance. If performance means execution-time, I guess it dose not. > However, if performance includes code-size, I think it may have some > negative impact. > Following is few scenario: > > - constants generated by the post-IPO passes are not shared across > partitions > - dead func may be detected during the post-IPO stage, and they may not > be deleted. > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Tue Jul 16 16:07:05 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 16:07:05 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> Message-ID: <51E5D219.4050109@gmail.com> On 7/16/13 3:32 PM, Nick Lewycky wrote: > On 12 July 2013 15:49, Shuxin Yang > wrote: > > Hi, There: > > This is the proposal for parallelizing post-ipo stage. See the > following for details. > > I also attach a toy-grade rudimentary implementation. This > implementation can be > used to illustrate some concepts here. This patch is not going to > be committed. > > Unfortunately, this weekend I will be too busy to read emails. > Please do not construe > delayed response as being rude :-). > > Thanks a lot in advance for your time insightful comments! > > Shuxin > > > The proposal > ------------ > It is organized as following: > 1) background info, if you heard "/usr/bin/ls", please skip it > 2) the motivation of parallelize post-IPO stage > 3) how to parallelize post-IPO > 4) the linker problems. > 5) the toy-grade rudimentary implementation > 6) misc > > 1.Some background > ------------------ > > The Interprocedural-optimization compilation, aka IPO or IPA, > typically > consists of three stages: > > S1) pre-IPO > Each function goes through some analysis and > not-very-aggressive optimizations. > Some information is collected during this stage, this info > will be to IPO stages. > This info is usually called summary info. > > The result of this stage is "fake-objects" which is binary > files using > some known object format to encapsulate IR as well as summary > info along with > other stuff. > > S2) IPO: > Compiler works with linker to resolve and merge symbols in the > "fake-objects" > > Then Interprocedural analyses (IPA) are invoked to perform > interprocedural > analysis either based on the summary-info, or directly on the IR. > > Interprocedural optimizations (IPO) are called based on the > IPA result. > > In some compilers, IPA and IPO are separated. One reason is > that many IPAs can > be directly conduct on the concise summary info, while many > IPOs need to load > IRs and bulky annotation/metadata into memory. > > S3) post-IPO: > Typically consist of Loop-Nest-Opt, Scalar Opt, Code-Gen etc > etc. While they > are intra-procedural analyses/optimizers, they may directly > benefit from > the info collected in the IPO stages and pass down the road. > > LLVM collectively call S2 and S3 as "LTO CodeGen", which is very > confusing. > > 2. Why parallelize post-IPO stage > ============================== > > R1) To improve the scalarbility > It is quite obvious that we are not able to put everything > about a monster > program in memory at once. > > Even if you can afford a expensive computer, the address space > of a > single compiler process cannot accommodate a monster program. > > R2) to take advantage of ample HW resource to shorten compile time. > R3) make debugging lot easier. > One can triage problems in a much smaller partition rather than > the huge monster program. > > This proposal is not able to shoot the goal R1 at this moment, > because during > the IPO stage, currently the compiler brings everything into > memory at once. > > 3. How to parallelize post-IPO stage > ==================================== > > From 5k' high, the concept is very simple, just to > step 1).divide the merged IR into small pieces, > step 2).and compile each of this pieces independendly. > step 3) the objects of each piece are fed back to linker to are > linked > into an executable, or a dynamic lib. > > Section 3.1 through 3.3 describe these three steps respectively. > > > Yes, this is one approach. I think others at Google have looked into > this sort of partitioning with GCC and found that the one thing which > really helps you pick the partitions, is to profile the program and > make the partitions based on the actual paths of functions seen by the > profile. I don't think they saw much improvement without that. See > http://research.google.com/pubs/pub36355.html . > > I do want to mention some other things we can do to parallelize. You > may already know of these and have considered them and rejected them > before deciding on the design you emailed out here, but I want to list > them since there's already another thread with a different approach on > the mailing list. > > * Use what we have. LTO partitions at a time, directories for > instance, on the premise that LTO'ing them will produce something > smaller than the sum of its inputs. Then when you do the final > whole-program step, it will be receiving smaller inputs than it > otherwise would. The change to LLVM here is to fix the inliner (or > other optimizations?) to reduce the chance that LTO produces an output > larger than its input. > > * Parallelize the per-function code generator within a single > LLVMContext. CodeGen presently operates on a per-function basis, and > is structured as an analysis over llvm IR. There shouldn't be any > global state, and there shouldn't be any need for locking accesses to > the IR since nothing will be mutating it. This will even speed up > clang -O0, and is a solid first step that gets a thread-creation API > into LLVM. > - What happened to "one LLVMContext per thread"? Okay, that rule is > a little white lie. Always was. LLVMContext allows two libraries to > use llvm under the hood without interfering with each other (even to > the point of separate maps of types to avoid one library from causing > slow type lookups in the other). LLVM also doesn't have locking around > accesses to the IR, and few guarantees how many things a single > mutating operation will need to look at or change, but with those > caveats in mind it is possible to share a context across threads. > Because CodeGen is structured as an analysis over the IR without > mutating the IR, it should work. There's probably still global state > in the code generator somewhere, but it's not a structural problem. > > * Parallelize the function passes, and SCCs that are siblings in the > call tree (again within a single LLVMContext). The gnarly part of this > is that globals have shared use-lists which are updated as we modify > each function individually. Those globals either need to have locks on > their use-lists, replaced with a lockless list, or removed entirely. > Probably both, as GlobalVariable's have use-lists we actually use in > the optimizers, but we don't actually need the use-list for "i32 0". > > * Parallelize ModulePasses by splitting them into an analysis phase > and an optimization phase. Make each per-TU build emit the .bc as > usual plus an analysis-file (for instance, call graph, or "which > functions reads/modify which globals"). Merge all the analysis-files > and farm them back out to be used as input to the programs optimizing > each .bc individually -- but now they have total knowledge of the > whole-program call graph and other AA information, etc. > - You can combine this with an RPC layer to give each worker the > ability to ask for the definition of a function from another worker. > LLVM already supports "unmaterialized" functions where the definition > is loaded lazily. The analysis part should arrange to give us enough > information to determine whether we want to do the inlining, then if > we decide to materialize the function we get its body from another worker. > > * Parallelize by splitting into different LLVMContexts. This spares us > the difficulty of adding locks or otherwise changing the internals of > LLVM, gets us the ability to spread the load across multiple machines, > and if combined with the RPC idea above you can also get good inlining > without necessarily loading the whole program into memory on a single > machine. > > I'm not planning to do the work any time soon so count my vote with > that in mind, but if you ask me I think the first step should be to > parallelize the backend within a single LLVMContext first, then to > parallelize the function passes and CGSCC passes (across siblings only > of course) second. Removing the use-list from simple constants is a > very interesting thing to do to decrease lock contention, but we may > want to do something smarter than just remove it -- consider emitting > a large constant that is only used by an inline function. It is > possible to emit a table of constants in the same COMDAT group as the > function, then if the inline function is discarded by the linker the > constants are discarded with it. I don't have a concrete proposal for > that. > > Nick > > Thank you for sharing your enlightening thoughts. I heard some ideas before, some are quite new to me. I will take this into account as the project move on. The motivation of this project is not merely to make compilation faster. It is also to: - significantly ease trouble shooting -- I was asked to fix LTO bugs for several times, it almost drive me mad to pin-point the bug in a huge merged module. It is definitely an unglamorous and painstaking undertaking:-) - this is one step toward better scalability. For now, I don't want to parallelize CodeGen only, as post-IPO scalar opt is compile-time hogging as well. On the other, HPC folks may invoke Loop-Nest-Opt/autopar/etc/ in the post-IPO stage as well, those intrinsically very slow breeds. So parallelize the entire post-IPO stage will make them happy. Finer-grained parallelism could be promising, however, it is too error-prone:-). -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Tue Jul 16 17:21:18 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Tue, 16 Jul 2013 17:21:18 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. Message-ID: Hi, I would like to start a discussion about error/warning reporting in LLVM and how we can extend the current mechanism to take advantage of clang capabilities. ** Motivation ** Currently LLVM provides a way to report error either directly (print to stderr) or by using a user defined error handler. For instance, in inline asm parsing, we can specify the diagnostic handler to report the errors in clang. The basic idea would be to be able to do that for warnings too (and for other kind of errors?). A motivating example can be found with the following link where we want LLVM to be able to warn on the stack size to help developing kernels: http://llvm.org/bugs/show_bug.cgi?id=4072 By adding this capability, we would be able to have access to all the nice features clang provides with warnings: - Promote it to an error. - Ignore it. ** Challenge ** To be able to take advantage of clang framework for warning/error reporting, warnings have to be associated with warning groups. Thus, we need a way for the backend to specify a front-end warning type. The challenge is, AFAICT (which is not much, I admit), that front-end warning types are statically handled using tablegen representation. ** Advices Needed ** 1. Decide whether or not we want such capabilities (if we do not we may just add sporadically the support for a new warning/group of warning/error). 2. Come up with a plan to implement that (assuming we want it). Thanks for the feedbacks. Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Tue Jul 16 17:48:25 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 16 Jul 2013 17:48:25 -0700 Subject: [LLVMdev] General strategy to optimize LLVM IR In-Reply-To: References: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> Message-ID: <78AE37F9-B9A2-4C68-ABFA-CBC29FEE7E7A@apple.com> On Jul 16, 2013, at 11:07 AM, David Blaikie wrote: > On Tue, Jul 16, 2013 at 8:16 AM, Stéphane Letz wrote: >> Hi, >> >> Our DSL emit sub-optimal LLVM IR that we optimize later on (LLVM IR ==> LLVM IR) before dynamically compiling it with the JIT. We would like to simply follow what clang/clang++ does when compiling with -O1/-O2/-O3 options. Our strategy up to now what to look at the opt.cpp code and take part of it in order to implement our optimization code. >> >> It appears to be rather difficult to follow evolution of the LLVM IR optimization strategies. With LLVM 3.3 our optimization code does not produce code as fast as the one produced with clang -03 anymore. Moreover the new vectorizations passes are still not working. >> >> It there a recommended way to add -O1/-O2/-O3 kind of optimizations on LLVM IR code? Any code to look at beside the opt.cpp tool? > > I'm not /entirely/ sure what you're asking. It sounds like you're > asking "what passes should my compiler's -O1/2/3 flag's correspond to" > and one answer to that is to look at Clang (I think Clang's is > different from opt/llc's, maybe). PassManagerBuilder decides what passes to run. Unfortunately, the clang driver uses a back door to set a bunch of flags that configure PassManagerBuilder See EmitAssemblyHelper::CreatePasses. I find this extremely difficult to follow and don’t know of any way to derive an equivalent “opt” command line. Good luck. -Andy > >> >> Thanks. >> >> Stéphane Letz >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Tue Jul 16 17:51:40 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Tue, 16 Jul 2013 17:51:40 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: > Hi, > > I would like to start a discussion about error/warning reporting in LLVM and > how we can extend the current mechanism to take advantage of clang > capabilities. > > > ** Motivation ** > > Currently LLVM provides a way to report error either directly (print to > stderr) or by using a user defined error handler. For instance, in inline > asm parsing, we can specify the diagnostic handler to report the errors in > clang. > > The basic idea would be to be able to do that for warnings too (and for > other kind of errors?). > A motivating example can be found with the following link where we want LLVM > to be able to warn on the stack size to help developing kernels: > http://llvm.org/bugs/show_bug.cgi?id=4072 > > By adding this capability, we would be able to have access to all the nice > features clang provides with warnings: > - Promote it to an error. > - Ignore it. > > > ** Challenge ** > > To be able to take advantage of clang framework for warning/error reporting, > warnings have to be associated with warning groups. > Thus, we need a way for the backend to specify a front-end warning type. > > The challenge is, AFAICT (which is not much, I admit), that front-end > warning types are statically handled using tablegen representation. > > > ** Advices Needed ** > > 1. Decide whether or not we want such capabilities (if we do not we may just > add sporadically the support for a new warning/group of warning/error). > 2. Come up with a plan to implement that (assuming we want it). The frontend should be presenting warnings, not the backend; adding a hook which provides the appropriate information shouldn't be too hard. Warnings coming out of the backend are very difficult to design well, so I don't expect we will add many. Also, keep in mind that the information coming out of the backend could be used in other ways; it might not make sense for the backend to decide that some piece of information should be presented as a warning. (Consider, for example, IDE integration to provide additional information about functions and loops on demand.) -Eli On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: > Hi, > > I would like to start a discussion about error/warning reporting in LLVM and > how we can extend the current mechanism to take advantage of clang > capabilities. > > > ** Motivation ** > > Currently LLVM provides a way to report error either directly (print to > stderr) or by using a user defined error handler. For instance, in inline > asm parsing, we can specify the diagnostic handler to report the errors in > clang. > > The basic idea would be to be able to do that for warnings too (and for > other kind of errors?). > A motivating example can be found with the following link where we want LLVM > to be able to warn on the stack size to help developing kernels: > http://llvm.org/bugs/show_bug.cgi?id=4072 > > By adding this capability, we would be able to have access to all the nice > features clang provides with warnings: > - Promote it to an error. > - Ignore it. > > > ** Challenge ** > > To be able to take advantage of clang framework for warning/error reporting, > warnings have to be associated with warning groups. > Thus, we need a way for the backend to specify a front-end warning type. > > The challenge is, AFAICT (which is not much, I admit), that front-end > warning types are statically handled using tablegen representation. > > > ** Advices Needed ** > > 1. Decide whether or not we want such capabilities (if we do not we may just > add sporadically the support for a new warning/group of warning/error). > 2. Come up with a plan to implement that (assuming we want it). > > > Thanks for the feedbacks. > > Cheers, > > -Quentin > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From rkotler at mips.com Tue Jul 16 17:53:17 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 17:53:17 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> Message-ID: <51E5EAFD.3020203@mips.com> On 07/16/2013 05:21 AM, Tilmann Scheller wrote: > Hi Reed, > > I’ve used Eclipse for a long time to do LLVM development on Linux (both for code navigation/editing and debugging), any recent Linux distribution and version of Eclipse should be fine (even older versions should be good enough as this has been working for many years). > > Xcode works fine as well, I started to use Xcode exclusively when I switched to OS X. > > The key to make this work is to use CMake to generate project files for Eclipse/Xcode, you can do this by specifying the appropriate generator on the command line e.g. -G Xcode or -G "Eclipse CDT4 - Unix Makefiles”. Then you can just open the generated project file. Mind you, the generated projects are kind of ugly e.g. the Xcode project has like more than 200 targets but apart from that they are working fine. > > In terms of key bindings both Eclipse and Xcode ship with Emacs key bindings and there are plugins which allow you to use vim key bindings as well. With Eclipse I’ve been using the Viable plugin for that and for Xcode there is Xvim. > > Hope this helps :) > > Regards, > > Tilmann Have you had trouble with the C++ indexer getting stuck in some kind of infinite loop indexing. It's happening to me with this Cmake (but happened too without cmake). Seems to be a common problem. I have a giant mac at home and am thinking of maybe switching to mac at work to get a reasonable IDE that is reliable. > > On Jul 16, 2013, at 1:05 PM, reed kotler wrote: > >> Is anyone using Eclipse and gdb to debug llvm/clang? >> If so, which version of Eclipse, gdb and linux flavor. >> >> I just use gdb currently. >> >> I'm going to try using my mac also. >> Is anyone using xcode/lldb to debug llvm/clang? >> >> Tia. >> >> Reed >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From chandlerc at google.com Tue Jul 16 17:57:59 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 16 Jul 2013 17:57:59 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 5:51 PM, Eli Friedman wrote: > > 1. Decide whether or not we want such capabilities (if we do not we may > just > > add sporadically the support for a new warning/group of warning/error). > > 2. Come up with a plan to implement that (assuming we want it). > > The frontend should be presenting warnings, not the backend; adding a > hook which provides the appropriate information shouldn't be too hard. > Warnings coming out of the backend are very difficult to design well, > so I don't expect we will add many. Also, keep in mind that the > information coming out of the backend could be used in other ways; it > might not make sense for the backend to decide that some piece of > information should be presented as a warning. (Consider, for example, > IDE integration to provide additional information about functions and > loops on demand.) > > -Eli I really like this design, where essentially the frontend can query the backend for very simple information using a nice API, and then emit the warning itself. I'm happy for the warning to be in the CodeGen layer of the frontend, and only be reachable when generating code for that function body. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Tue Jul 16 18:01:14 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 18:01:14 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5EAFD.3020203@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> Message-ID: <51E5ECDA.5000107@mips.com> The Eclipse indexer seems to get stuck in the Clang unittests/AST >> Hope this helps :) >> >> Regards, >> >> Tilmann > > Have you had trouble with the C++ indexer getting stuck in some kind of > infinite loop indexing. > > It's happening to me with this Cmake (but happened too without cmake). > > Seems to be a common problem. > > I have a giant mac at home and am thinking of maybe switching to mac at > work to get a reasonable IDE that is reliable. > >> >> On Jul 16, 2013, at 1:05 PM, reed kotler wrote: >> >>> Is anyone using Eclipse and gdb to debug llvm/clang? >>> If so, which version of Eclipse, gdb and linux flavor. >>> >>> I just use gdb currently. >>> >>> I'm going to try using my mac also. >>> Is anyone using xcode/lldb to debug llvm/clang? >>> >>> Tia. >>> >>> Reed >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From atcuno at gmail.com Tue Jul 16 18:02:13 2013 From: atcuno at gmail.com (Andrew Case) Date: Tue, 16 Jul 2013 20:02:13 -0500 Subject: [LLVMdev] Setting endian/byte order through disassemble command? Message-ID: Hello, I am working on auto-analysis with lldb using ARM (thumb) disassembly, but am having problems with the 'disassemble' command. It seems that llvm is defaulting to big-endian processing, and I cannot figure out how to switch the mode to little endian. Here is the output: (lldb) disassemble -A thumb -b -s 0x687f4 -e 0x68808 testfile[0x687f4]: 0x4bbe .short 0x4bbe ; unknown opcode testfile[0x687f6]: 0x4abf .short 0x4abf ; unknown opcode testfile[0x687f8]: 0x447b .short 0x447b ; unknown opcode testfile[0x687fa]: 0x49bf .short 0x49bf ; unknown opcode testfile[0x687fc]: 0x4ff0e92d .long 0x4ff0e92d ; unknown opcode testfile[0x68800]: 0xb0a7 .short 0xb0a7 ; unknown opcode testfile[0x68802]: 0x589e .short 0x589e ; unknown opcode testfile[0x68804]: 0x6830 .short 0x6830 ; unknown opcode testfile[0x68806]: 0x9025 .short 0x9025 ; unknown opcode ----------- as you can see, none of the opcodes are known as each 2 byte pair is backwards. When I process the file with IDA Pro it treats the code as thumb little endian and the disassembly looks as expected and the byte pairs are switched. Also, this is kind of a different question, but my analysis currently works through Python scripting and eventually calling out to the 'disassemble' command. Is there a more structured API in python for disassembly (e.g. getting parsed informed for each instruction?). Thanks for any help! From atrick at apple.com Tue Jul 16 18:07:45 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 16 Jul 2013 18:07:45 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> Message-ID: On Jul 16, 2013, at 3:32 PM, Nick Lewycky wrote: > * Parallelize ModulePasses by splitting them into an analysis phase and an optimization phase. Make each per-TU build emit the .bc as usual plus an analysis-file (for instance, call graph, or "which functions reads/modify which globals"). Merge all the analysis-files and farm them back out to be used as input to the programs optimizing each .bc individually -- but now they have total knowledge of the whole-program call graph and other AA information, etc. Shuxin presented the same idea to me. It offers the best opportunity to scale while sidestepping concurrency issues. It avoids problem of loading all functions during the IPO phase then cloning them into separate LLVMContexts later. I don’t see this as part of the first milestone though. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Tue Jul 16 18:19:44 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 18:19:44 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5ECDA.5000107@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> <51E5ECDA.5000107@mips.com> Message-ID: <51E5F130.1080400@mips.com> On 07/16/2013 06:01 PM, Reed Kotler wrote: > The Eclipse indexer seems to get stuck in the Clang unittests/AST > In Eclipse you can tell it that a given directory is derived, and then it won't try and index it. Probably the more complex clang tests are too involved for the indexer. >>> Hope this helps :) >>> >>> Regards, >>> >>> Tilmann >> >> Have you had trouble with the C++ indexer getting stuck in some kind of >> infinite loop indexing. >> >> It's happening to me with this Cmake (but happened too without cmake). >> >> Seems to be a common problem. >> >> I have a giant mac at home and am thinking of maybe switching to mac at >> work to get a reasonable IDE that is reliable. >> >>> >>> On Jul 16, 2013, at 1:05 PM, reed kotler wrote: >>> >>>> Is anyone using Eclipse and gdb to debug llvm/clang? >>>> If so, which version of Eclipse, gdb and linux flavor. >>>> >>>> I just use gdb currently. >>>> >>>> I'm going to try using my mac also. >>>> Is anyone using xcode/lldb to debug llvm/clang? >>>> >>>> Tia. >>>> >>>> Reed >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From maple.hl at gmail.com Tue Jul 16 18:29:22 2013 From: maple.hl at gmail.com (=?UTF-8?B?5bCP5Yia?=) Date: Wed, 17 Jul 2013 09:29:22 +0800 Subject: [LLVMdev] make lldb work In-Reply-To: References: Message-ID: The host is a 32-bit box, and the debug program is 32-bit as well. The `breakpoint list` command works well. Actually, I've compiled lldb from source, which doesn't change the situation. So this is very possible a 32-bit bug. I think I should find a 64-bit Linux box, and try again. And I may also try to locate the problem, which will take some time, since I'm new to this. ―― 美丽有两种: 一是深刻又动人的方程, 一是你泛着倦意淡淡的笑容。 On Wed, Jul 17, 2013 at 2:04 AM, Malea, Daniel wrote: > Hi, > > I notice you're running a 32-bit program; are you also on a 32-bit host, > or do you have a 64-bit OS installed? We don't generally test on 32-bit > hosts, so it's possible you found a new bug. In addition, there are some > known bugs with debugging 32-bit programs (even on 64-bit hosts) which will > we hopefully be resolving soon. > > Nonetheless, I was unable to reproduce the behaviour you reported (with > lldb-3.4 Ubuntu package version r186406). What's the output of "breakpoint > list" -- does LLDB resolve any address for the breakpoint? > > Here is my LLDB session on a 64-bit host debugging a 32-bit program: > > daniel at lautrec:~$ lldb ./a.out > Current executable set to './a.out' (i386). > (lldb) breakpoint set -l 8 > Breakpoint 1: where = a.out`main + 67 at bla.cpp:9, address = 0x080484f3 > (lldb) breakpoint set -l 12 > Breakpoint 2: no locations (pending). > WARNING: Unable to resolve breakpoint to any actual locations. > (lldb) breakpoint list > Current breakpoints: > 1: file = '/home/daniel/bla.cpp', line = 8, locations = 1 > 1.1: where = a.out`main + 67 at bla.cpp:9, address = 0x080484f3, > unresolved, hit count = 0 > > 2: file = '/home/daniel/bla.cpp', line = 12, locations = 0 (pending) > > (lldb) process launch > Process 22954 launched: './a.out' (i386) > Process 22954 stopped > * thread #1: tid = 0x59aa, 0x080484f3 a.out`main(argc=1, argv=0xffa37624) > + 67 at bla.cpp:9, name = 'a.out, stop reason = breakpoint 1.1 > frame #0: 0x080484f3 a.out`main(argc=1, argv=0xffa37624) + 67 at > bla.cpp:9 > 6 while ( counter < 10 ) > 7 counter++; > 8 > -> 9 printf("counter: %d\n", counter); > 10 > 11 return 0; > 12 } > (lldb) > > > > > From: 小刚 > Reply-To: "Maple.HL at gmail.com" > Date: Monday, 15 July, 2013 10:00 PM > To: LLVM List > Subject: [LLVMdev] make lldb work > > Sorry if asked before. > > I'm new to LLDB, try to use it according to the lldb project site. I > write some very simple code like: > > #include > > int main(int argc, char **argv) > { > int counter = 0; > while ( counter < 10 ) > counter++; > > printf("counter: %d\n", counter); > > return 0; > } > > and the session like: > > $ clang -g main.c > $ lldb-3.4 a.out > (lldb) breakpoint set -l 8 > ...... > (lldb) breakpoint set -l 12 > ...... > (lldb) breakpoint list > ...... > (lldb) process launch > Process 1105 launched: '/home/maple/debug/arena/a.out' (i386) > counter: 10 > Process 1105 exited with status = 0 (0x00000000) > > I checked with gdb, it works well. I'm not sure whether it's a bug or my > false command. > > I'm using Ubuntu 12.04, and the lldb is from llvm.org/apt. It's > svn186357. > > ―― > 美丽有两种: > 一是深刻又动人的方程, > 一是你泛着倦意淡淡的笑容。 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Tue Jul 16 18:39:57 2013 From: peter at uformia.com (Peter Newman) Date: Wed, 17 Jul 2013 11:39:57 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 Message-ID: <51E5F5ED.1000808@uformia.com> Hello all, I'm currently in the process of debugging a crash occurring in our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is attempting to perform access unaligned memory with a SSE2 instruction. However this only happens under certain conditions that seem (but may not be) related to the stacks state on calling the function. Our program acts as a front-end, using the LLVM C++ API to generate a JIT generated function. This function is primarily mathematical, so we use the Vector types to take advantage of SIMD instructions (as well as a few SSE2 intrinsics). This worked in LLVM 2.8 but started failing in 3.2 and has continued to fail in 3.3. It fails with no optimizations applied to the LLVM Function/Module. It crashes with what is reported as a memory access error (accessing 0xffffffff), however it's suggested that this is how the SSE fault raising mechanism appears. The generated instruction varies, but it seems to often be similar to (I don't have it in front of me, sorry): movapd xmm0, xmm[ecx+0x???????] Where the xmm register changes, and the second parameter is a memory access. ECX is always set to 0x7ffffff - however I don't know if this is part of the SSE error reporting process or is part of the situation causing the error. I haven't worked out exactly what code path etc is causing this crash. I'm hoping that someone can tell me if there were any changed requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first discovered the crash when using a feature that uses them), however I have attempted using setAlignment on the GlobalVariables without any change. -- Peter N From xiaofei.wan at intel.com Tue Jul 16 19:48:37 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Wed, 17 Jul 2013 02:48:37 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> Message-ID: <851E09B5CA368045827A32DA76E440AF019723EC@SHSMSX104.ccr.corp.intel.com> -----Original Message----- From: Xinliang David Li [mailto:xinliangli at gmail.com] Sent: Wednesday, July 17, 2013 4:18 AM To: Wan, Xiaofei Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation On Tue, Jul 16, 2013 at 3:33 AM, Wan, Xiaofei wrote: > Hi, community: > > For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks! > 1. Is this idea the proper solution for my requirement 2. This new > feature will be enabled by llc -thd=N and has no impact on original > llc when -thd=1 3. Can this new feature of llc be accepted by > community and merged into LLVM code tree > > Patches > The patch is divided into four separated parts, the all-in-one patch could be found here: > http://llvm-reviews.chandlerc.com/D1152 > > Design > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj > Y-vhyfySg/edit?usp=sharing > > > Background > 1. Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. > 2. Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. > 3. Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. > > Overall design strategy and goal > 1. Generate totally same binary as what single thread output 2. No > impacts on single thread performance & conformance 3. Little impacts > on LLVM code infrastructure > > Current status and test result > 1. Parallel llc can generate same code as single thread by "objdump > -d", it could pass 10 hours stress test for all performance benchmark > 2. Parallel llc can introduce ~2.9X performance gain on XEON sever for > 4 threads Ignoring FE time which can be fully parallelized and assuming 10% compile time is spent in serial module passes, 25% time is spent in CGSCC pass, the maximum speed up that can be gained by using function level parallelism is less than 3x. Even adding support for parallel compilation for leaves of CG in CGSCC pass won't help too much -- the percentage of leaf functions is < 30% in large apps I have seen. Module based parallelism proposed by Shuxin has max speed up of 10x, assuming body cloning does not add a lot overhead and build farm with hundred/thousands of nodes is used. [Xiaofei] for SpecCPU2006, I got the data function passes consume >90% of total time in llc by vtune (I don't enable LTO); here I only consider llc without LTO, the max parallelism depends how many threads are started. David > > > Thanks > Wan Xiaofei > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From xiaofei.wan at intel.com Tue Jul 16 19:51:17 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Wed, 17 Jul 2013 02:51:17 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <51E587CB.8000505@gmail.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> <51E587CB.8000505@gmail.com> Message-ID: <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> -----Original Message----- From: Shuxin Yang [mailto:shuxin.llvm at gmail.com] Sent: Wednesday, July 17, 2013 1:50 AM To: Wan, Xiaofei Cc: Evan Cheng; Shuxin Yang; LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation On 7/16/13 7:23 AM, Wan, Xiaofei wrote: > Yes, the purpose is similar, we started this job from last year; But > it Shuxin's solution is module based (correct me if I am wrong), we > tried this solution and failed for many reasons, it is described in my > design document > https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj > Y-vhyfySg/edit?usp=sharing > > we need discuss two solution and compare them, then adopt one solution > > The biggest difference of module based parallelism and function based > parallelism are 1. how to partition module into different pieces which > consume similar time, it is a difficult question Why difficult? > 2. How to make sure the generated binary is same each time It depends on what is the same. In the merged version, constant may keep one copy, while in the partition version, constant may be duplicated as the post-IPO passes may generated some constant, and they cannot share with the same constant generated in other partitions. All these issues don't sound to be a problem in practice. > 3. if 2 can't be achieved, it is difficult to validate the correctness > of parallelism It is nothing about the correctness. [Xiaofei] why? I don't understand it very well here, you mean it can generate totally identical binaries as the original llc, including the function order (function order may not affect code quality, but we should make sure the output is same in each run)? From rkotler at mips.com Tue Jul 16 19:53:31 2013 From: rkotler at mips.com (Reed Kotler) Date: Tue, 16 Jul 2013 19:53:31 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5F130.1080400@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> <51E5ECDA.5000107@mips.com> <51E5F130.1080400@mips.com> Message-ID: <51E6072B.2060004@mips.com> The last step here was to download the eclipse for c++ directly from eclipse and not get the packages from ubuntu, which is what i'm developing on. For many other things, if it is not coming from Ubuntu directly, there is something that in the end won't work, but it seems that the latest Eclipse works fine; Kepler 4.3 (but you need to download the C++ part too). On 07/16/2013 06:19 PM, Reed Kotler wrote: > On 07/16/2013 06:01 PM, Reed Kotler wrote: >> The Eclipse indexer seems to get stuck in the Clang unittests/AST >> > > In Eclipse you can tell it that a given directory is derived, and then > it won't try and index it. > > Probably the more complex clang tests are too involved for the indexer. > >>>> Hope this helps :) >>>> >>>> Regards, >>>> >>>> Tilmann >>> >>> Have you had trouble with the C++ indexer getting stuck in some kind of >>> infinite loop indexing. >>> >>> It's happening to me with this Cmake (but happened too without cmake). >>> >>> Seems to be a common problem. >>> >>> I have a giant mac at home and am thinking of maybe switching to mac at >>> work to get a reasonable IDE that is reliable. >>> >>>> >>>> On Jul 16, 2013, at 1:05 PM, reed kotler wrote: >>>> >>>>> Is anyone using Eclipse and gdb to debug llvm/clang? >>>>> If so, which version of Eclipse, gdb and linux flavor. >>>>> >>>>> I just use gdb currently. >>>>> >>>>> I'm going to try using my mac also. >>>>> Is anyone using xcode/lldb to debug llvm/clang? >>>>> >>>>> Tia. >>>>> >>>>> Reed >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From silvas at purdue.edu Tue Jul 16 20:38:55 2013 From: silvas at purdue.edu (Sean Silva) Date: Tue, 16 Jul 2013 20:38:55 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: > > The challenge is, AFAICT (which is not much, I admit), that front-end > warning types are statically handled using tablegen representation. > > They can also be added dynamically if necessary. See DiagnosticsEngine::getCustomDiagID -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Tue Jul 16 20:52:08 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 16 Jul 2013 20:52:08 -0700 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> <51E587CB.8000505@gmail.com> <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> Message-ID: <51E614E8.8060202@gmail.com> On 7/16/13 7:51 PM, Wan, Xiaofei wrote: > > -----Original Message----- > From: Shuxin Yang [mailto:shuxin.llvm at gmail.com] > Sent: Wednesday, July 17, 2013 1:50 AM > To: Wan, Xiaofei > Cc: Evan Cheng; Shuxin Yang; LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation > > > On 7/16/13 7:23 AM, Wan, Xiaofei wrote: >> Yes, the purpose is similar, we started this job from last year; But >> it Shuxin's solution is module based (correct me if I am wrong), we >> tried this solution and failed for many reasons, it is described in my >> design document >> https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgj >> Y-vhyfySg/edit?usp=sharing >> >> we need discuss two solution and compare them, then adopt one solution >> >> The biggest difference of module based parallelism and function based >> parallelism are 1. how to partition module into different pieces which >> consume similar time, it is a difficult question > Why difficult? > >> 2. How to make sure the generated binary is same each time > It depends on what is the same. In the merged version, constant may keep one copy, while in the partition version, constant may be duplicated as the post-IPO passes may generated some constant, and they cannot share with the same constant generated in other partitions. > > All these issues don't sound to be a problem in practice. > >> 3. if 2 can't be achieved, it is difficult to validate the correctness >> of parallelism > It is nothing about the correctness. > > [Xiaofei] why? I'm not sure what you are asking here. Are you asking why partition still preserve correctness? > I don't understand it very well here, you mean it can generate totally identical binaries as the original llc, I didn't say *totally* identical. IPO with partition could be slightly different from the one without partition, say the former may have duplicated constant in the .text section, which is totally acceptable in practice. So long as partition remain unchanged, each time LTO+partition should yield same result. > including the function order (function order may not affect code quality, but we should make sure the output is same in each run)? By "the same", which are involved in comparison: 1. IPO w/ partition vs IPO wo/ partition, 2. n-th run of IPO w/ partition n+1-th run of IPO w/partition If 2) should show some difference, it means it has some bugs. If you are talking about 1), nobody can guarantee 1) generate *totally* identical objections, and I don't understand we have such esoteric need. From boulos at cs.stanford.edu Tue Jul 16 20:58:49 2013 From: boulos at cs.stanford.edu (Solomon Boulos) Date: Tue, 16 Jul 2013 20:58:49 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E5F5ED.1000808@uformia.com> References: <51E5F5ED.1000808@uformia.com> Message-ID: <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> As someone off list just told me, perhaps my new bug is the same issue: http://llvm.org/bugs/show_bug.cgi?id=16640 Do you happen to be using FastISel? Solomon On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: > Hello all, > > I'm currently in the process of debugging a crash occurring in our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is attempting to perform access unaligned memory with a SSE2 instruction. However this only happens under certain conditions that seem (but may not be) related to the stacks state on calling the function. > > Our program acts as a front-end, using the LLVM C++ API to generate a JIT generated function. This function is primarily mathematical, so we use the Vector types to take advantage of SIMD instructions (as well as a few SSE2 intrinsics). > > This worked in LLVM 2.8 but started failing in 3.2 and has continued to fail in 3.3. It fails with no optimizations applied to the LLVM Function/Module. It crashes with what is reported as a memory access error (accessing 0xffffffff), however it's suggested that this is how the SSE fault raising mechanism appears. > > The generated instruction varies, but it seems to often be similar to (I don't have it in front of me, sorry): > movapd xmm0, xmm[ecx+0x???????] > Where the xmm register changes, and the second parameter is a memory access. > ECX is always set to 0x7ffffff - however I don't know if this is part of the SSE error reporting process or is part of the situation causing the error. > > I haven't worked out exactly what code path etc is causing this crash. I'm hoping that someone can tell me if there were any changed requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first discovered the crash when using a feature that uses them), however I have attempted using setAlignment on the GlobalVariables without any change. > > -- > Peter N > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From bob.wilson at apple.com Tue Jul 16 21:34:55 2013 From: bob.wilson at apple.com (Bob Wilson) Date: Tue, 16 Jul 2013 21:34:55 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: > On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: >> ** Advices Needed ** >> >> 1. Decide whether or not we want such capabilities (if we do not we may just >> add sporadically the support for a new warning/group of warning/error). >> 2. Come up with a plan to implement that (assuming we want it). > > The frontend should be presenting warnings, not the backend; adding a > hook which provides the appropriate information shouldn't be too hard. > Warnings coming out of the backend are very difficult to design well, > so I don't expect we will add many. Also, keep in mind that the > information coming out of the backend could be used in other ways; it > might not make sense for the backend to decide that some piece of > information should be presented as a warning. (Consider, for example, > IDE integration to provide additional information about functions and > loops on demand.) I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end. I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them. —Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Tue Jul 16 21:38:24 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 16 Jul 2013 21:38:24 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man Message-ID: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. --- Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. IR Canonicalization Pipeline: Function Passes { SimplifyCFG SROA-1 EarlyCSE } Call-Graph SCC Passes { Inline Function Passes { EarlyCSE SimplifyCFG InstCombine Early Loop Opts { LoopSimplify Rotate (when obvious) Full-Unroll (when obvious) } SROA-2 InstCombine GVN Reassociate Generic Loop Opts { LICM (Rotate on-demand) Unswitch } SCCP InstCombine JumpThreading CorrelatedValuePropagation AggressiveDCE } } IR optimizations that require target information or destructively modify the IR can run in a separate pipeline. This helps make a more a clean distinction between passes that may and may not use TargetTransformInfo. TargetTransformInfo encapsultes legal types and operation costs. IR instruction costs are approximate and relative. They do not represent def-use latencies nor do they distinguish between latency and cpu resources requirements--that level of machine modeling needs to be done in MI passes. IR Lowering Pipeline: Function Passes { Target SimplifyCFG (OptimizeCFG?) Target InstCombine (InstOptimize?) Target Loop Opts { SCEV IndvarSimplify (mainly sxt/zxt elimination) Vectorize/Unroll LSR (move LFTR here too) } SLP Vectorize LowerSwitch CodeGenPrepare } --- The above pass ordering is roughly something I think we can live with. Notice that I have: Full-Unroll -> SROA-2 -> GVN -> Loop-Opts since that solves some issues we have today. I don't currently have any reason to reorder the "late" IR optimization passes (those after generic loop opts). We do either need a GVN-util that loops opts and lowering passes may call on-demand after performing code motion, or we can rerun a non-iterative GVN-lite as a cleanup after lowering passes. If anyone can think of important dependencies between IR passes, this would be good time to point it out. We could probably make an adjustment to the ‘opt' driver so that the user can specify any mix of canonical and lowering passes. The first lowering pass and subsequent passes would run in the lowering function pass manager. ‘llc' could also optionally run the lowering pass pipeline for as convenience for users who want to run ‘opt' without specifying a triple/cpu. -Andy From Pidgeot18 at gmail.com Tue Jul 16 21:44:39 2013 From: Pidgeot18 at gmail.com (=?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?=) Date: Tue, 16 Jul 2013 23:44:39 -0500 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: <51E62137.3010403@gmail.com> On 7/16/2013 10:38 PM, Sean Silva wrote: > > > > On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet > wrote: > > > The challenge is, AFAICT (which is not much, I admit), that > front-end warning types are statically handled using tablegen > representation. > > > They can also be added dynamically if necessary. See > DiagnosticsEngine::getCustomDiagID As I recall, after using these from experience, this approach doesn't actually properly handle things like -Werror "automatically." -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pidgeot18 at gmail.com Tue Jul 16 21:47:24 2013 From: Pidgeot18 at gmail.com (=?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?=) Date: Tue, 16 Jul 2013 23:47:24 -0500 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> <51E587CB.8000505@gmail.com> <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> Message-ID: <51E621DC.60903@gmail.com> On 7/16/2013 9:51 PM, Wan, Xiaofei wrote: > [Xiaofei] why? I don't understand it very well here, you mean it can > generate totally identical binaries as the original llc, including the > function order (function order may not affect code quality, but we > should make sure the output is same in each run)? Per , function order can affect performance by up to 15%. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist From xiaofei.wan at intel.com Tue Jul 16 22:30:18 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Wed, 17 Jul 2013 05:30:18 +0000 Subject: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation In-Reply-To: <51E621DC.60903@gmail.com> References: <851E09B5CA368045827A32DA76E440AF019718D4@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF019718EA@SHSMSX104.ccr.corp.intel.com> <851E09B5CA368045827A32DA76E440AF01971907@SHSMSX104.ccr.corp.intel.com> <3C05A56F-310F-4816-BED5-C7DE530D712C@apple.com> <851E09B5CA368045827A32DA76E440AF01971B33@SHSMSX104.ccr.corp.intel.com> <51E587CB.8000505@gmail.com> <851E09B5CA368045827A32DA76E440AF01972405@SHSMSX104.ccr.corp.intel.com> <51E621DC.60903@gmail.com> Message-ID: <851E09B5CA368045827A32DA76E440AF01972571@SHSMSX104.ccr.corp.intel.com> Yes, sometime it may affect code layout which may affect performance; theoretically it is better if we could generate totally identical code, including function order, meanwhile, it is easy to validate whether parallelism is correct, just compare the outputs simply -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joshua Cranmer ?? Sent: Wednesday, July 17, 2013 12:47 PM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation On 7/16/2013 9:51 PM, Wan, Xiaofei wrote: > [Xiaofei] why? I don't understand it very well here, you mean it can > generate totally identical binaries as the original llc, including the > function order (function order may not affect code quality, but we > should make sure the output is same in each run)? Per , function order can affect performance by up to 15%. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From artagnon at gmail.com Tue Jul 16 23:24:21 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 17 Jul 2013 11:54:21 +0530 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> Message-ID: Jim Grosbach wrote: > No. The above rule is absolutely the wrong thing to do, as has been > previously noted. I don't give a shit about whether you think it is "absolutely wrong" or not; I did what hpa and the Intel manual outlined. If you have some _reason_ not to do that, bring it up. I reported four bugs a few days ago, and the community has shown ZERO (if not NEGATIVE) interest in fixing them. I got Linus and hpa to comment on the issue, and help the community figure out what needs to be done. I posted not one, but MULTIPLE patches demonstrating desirable behavior, despite having ZERO prior experience with compiler engineering. Nobody else has posted a single patch, or helped me write one; instead, they have been sitting around being fabulously counter-productive, and stalling all progress. Can you remind me why I'm still trying to help LLVM, and don't just throw it out the window? (Hint: It's sheer persistence; anyone else would've given up a long time ago) Do you value contributors at all? (That's a rhetorical, because I already know the answer from the way you've been treating me: no) Do you care about getting LLVM to work with real-world codebases? (Again a rhetorical, because I already know the answer: no) From joerg at britannica.bec.de Tue Jul 16 23:38:02 2013 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Wed, 17 Jul 2013 08:38:02 +0200 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> Message-ID: <20130717063802.GA3258@britannica.bec.de> On Wed, Jul 17, 2013 at 11:54:21AM +0530, Ramkumar Ramachandra wrote: > Jim Grosbach wrote: > > No. The above rule is absolutely the wrong thing to do, as has been > > previously noted. > > I don't give a shit about whether you think it is "absolutely wrong" > or not; I did what hpa and the Intel manual outlined. If you have > some _reason_ not to do that, bring it up. In case you have missed this, this is not LKML. Please keep your abusive language at home. Linus and hpa are no almighty authorities here and this is not the Linux kernel community. Joerg From artagnon at gmail.com Tue Jul 16 23:56:53 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 17 Jul 2013 12:26:53 +0530 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: <20130717063802.GA3258@britannica.bec.de> References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> <20130717063802.GA3258@britannica.bec.de> Message-ID: Joerg Sonnenberger wrote: > Linus and hpa are no almighty authorities here and > this is not the Linux kernel community. Who said anything about almighty authorities, and who mentioned Linus or the kernel community now? Their emails are on the LLVMDev list for everyone to read: I picked up what made sense to me. But whatever. From David.Chisnall at cl.cam.ac.uk Wed Jul 17 00:40:58 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Wed, 17 Jul 2013 08:40:58 +0100 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> <20130717063802.GA3258@britannica.bec.de> Message-ID: <487B77BD-79C1-4180-B7A1-616A7AF28D0A@cl.cam.ac.uk> On 17 Jul 2013, at 07:56, Ramkumar Ramachandra wrote: > who mentioned Linus > or the kernel community now? You did: > I got Linus and hpa to > comment on the issue, Linus' comments were also confrontational and impolite, and he then proceeded to continue Linux-specific discussions that were completely off-topic for this list while keeping LLVMDev on the cc: list, wasting the time (and bandwidth) of all of the subscribers to this list who are not Linux developers. The Linux kernel community has been in the tech news in the last couple of days defending abusive behaviour on mailing lists. This is something that we do not accept in the LLVM community, nor in any of the other open source communities that I am a member of. For example: > I don't give a shit about whether you think it is "absolutely wrong" > or not This is completely inappropriate for any public mailing list. Apparently the LKML puts up with this kind of things, but most community driven projects do not. > Do you value contributors at all? (That's a rhetorical, because I > already know the answer from the way you've been treating me: no) Yes, contributors are valuable, however contributors are members of a community and are expected to behave in a way that reflects this. There are a lot of people on this mailing list that I have argued with on technical matters, but none has ever felt the need to resort to profanity or personal attacks. > Do you care about getting LLVM to work with real-world codebases? > (Again a rhetorical, because I already know the answer: no) You mean like iOS, OS X, or FreeBSD, which all use Clang/LLVM as their system compiler? Or perhaps the Android NDK, which ships Clang 3.1 in the latest release and has Clang 3.3 in trunk? Or Debian, which now tests building all packages with Clang and has a repository where they can be downloaded? Yes, we're quite interested in getting LLVM to work with real-world codebases. In many cases, we encounter a question of whether to support poor design choices in GCC that are used by a small number of packages, or impose something stricter, which benefits everyone in the long run in terms of code cleanups. You view Linux as a very important package. This list has some Google people who might agree with you, but it also has a lot of Apple employees, and people like Joerg (who led the integration of Clang and NetBSD) and myself (a member of the FreeBSD Core Team), for whom it's just some third-party codebase with a reputation for relying on GCC-specific behaviour and containing code that relies on undefined behaviour and a vocal community that complains to GCC whenever they change their interpretation of undefined behaviour. If we can support it, that's great. If supporting it comes at a cost to other users of LLVM (either in terms of worse code, or less error reporting), then that's something that will need to be evaluated carefully on its merits, which is what the discussion that was already in progress was doing. David From renato.golin at linaro.org Wed Jul 17 00:54:22 2013 From: renato.golin at linaro.org (Renato Golin) Date: Wed, 17 Jul 2013 08:54:22 +0100 Subject: [LLVMdev] Cambridge LLVM Social Today! Message-ID: Hi folks, The Cambridge social is today, and with this weather, we'll be at the fort st. George, hoping that there are no fairs outside to compromise our beer enjoyment programme. See you at the George around 7:30! Cheers, Renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.dutton at gmail.com Wed Jul 17 00:54:34 2013 From: james.dutton at gmail.com (James Courtier-Dutton) Date: Wed, 17 Jul 2013 08:54:34 +0100 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: <20130717063802.GA3258@britannica.bec.de> References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> <20130717063802.GA3258@britannica.bec.de> Message-ID: On Jul 17, 2013 7:41 AM, "Joerg Sonnenberger" wrote: > > On Wed, Jul 17, 2013 at 11:54:21AM +0530, Ramkumar Ramachandra wrote: > > Jim Grosbach wrote: > > > No. The above rule is absolutely the wrong thing to do, as has been > > > previously noted. > > > > I don't give a shit about whether you think it is "absolutely wrong" > > or not; I did what hpa and the Intel manual outlined. If you have > > some _reason_ not to do that, bring it up. > > In case you have missed this, this is not LKML. Please keep your abusive > language at home. Linus and hpa are no almighty authorities here and > this is not the Linux kernel community. > Linus is an expert on the x86 instruction set, so his advice on what to do with bts and btc should be taken seriously. So i thing that although Linus is not an LLVM authority, he should be considered an x86 authority. James. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kcc at google.com Wed Jul 17 01:17:30 2013 From: kcc at google.com (Kostya Serebryany) Date: Wed, 17 Jul 2013 12:17:30 +0400 Subject: [LLVMdev] Error building compiler-rt In-Reply-To: References: Message-ID: +Sergey Matveev On Wed, Jul 10, 2013 at 2:00 AM, Andy Jost wrote: > Ok, after familiarizing myself with clone it appears to me this is a bug > in compiler-rt.**** > > ** ** > > From the clone man page:**** > > ** ** > > In Linux 2.4 and earlier, clone() does not take arguments ptid, tls, and > ctid. **** > > ** ** > > The source file passes those arguments without any fencing to check the > Linux version. Also, ptid, tls, and ctid are only used in conjunction with > certain flags (e.g., CLONE_PARENT_SETTID), but none of those flags are set. > **** > > ** ** > > It looks like the fix (for all Linux versions) would be to simply remove > the last three arguments from the call.**** > > ** ** > > -Andy**** > > ** ** > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Andy Jost > *Sent:* Tuesday, July 09, 2013 2:44 PM > *To:* LLVMdev at cs.uiuc.edu > *Subject:* [LLVMdev] Error building compiler-rt**** > > ** ** > > Hi,**** > > ** ** > > I get the following error while building compiler-rt:**** > > ** ** > > /slowfs/msret_s1_us03/ajost/src/llvm-3.3.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux.cc:315:22: > error: no matching function for call to 'clone'**** > > pid_t tracer_pid = clone(TracerThread, tracer_stack.Bottom(),**** > > ^~~~~**** > > /usr/include/bits/sched.h:71:12: note: candidate function not viable: > requires 4 arguments, but 7 were provided**** > > extern int clone (int (*__fn) (void *__arg), void *__child_stack,**** > > ** ** > > Inside sched.h, clone is indeed declared with four arguments, but, > interestingly, the man page for clone provides this prototype:**** > > ** ** > > #include **** > > ** ** > > int clone(int (*fn)(void *), void *child_stack,**** > > int flags, void *arg, ...**** > > /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ );** > ** > > ** ** > > I’m running RedHat EL 2.6.9-89.ELlargesmp without root privileges.**** > > ** ** > > Is this a bug in LLVM? Do I just have an old version of clone that’s not > supported by LLVM? I can try just removing the last three arguments from > the compiler-rt source, but is that the best solution? If someone can > point out a clean way to fix this, then I don’t mind trying to contribute a > patch (I would need to learn how).**** > > ** ** > > Also, is this something that autoconf should have detected? What should > it have done about it?**** > > ** ** > > -Andy **** > > ** ** > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From artagnon at gmail.com Wed Jul 17 01:23:19 2013 From: artagnon at gmail.com (Ramkumar Ramachandra) Date: Wed, 17 Jul 2013 13:53:19 +0530 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: <487B77BD-79C1-4180-B7A1-616A7AF28D0A@cl.cam.ac.uk> References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> <20130717063802.GA3258@britannica.bec.de> <487B77BD-79C1-4180-B7A1-616A7AF28D0A@cl.cam.ac.uk> Message-ID: David Chisnall wrote: >> I got Linus and hpa to >> comment on the issue, > > Linus' comments were also confrontational and impolite, and he then proceeded to continue Linux-specific discussions that were completely off-topic for this list while keeping LLVMDev on the cc: list, wasting the time (and bandwidth) of all of the subscribers to this list who are not Linux developers. I'm sorry you didn't find their inputs valuable, and discriminate against them on the basis of tone. I personally found them _very_ insightful. > The Linux kernel community has been in the tech news in the last couple of days defending abusive behaviour on mailing lists. This is something that we do not accept in the LLVM community, nor in any of the other open source communities that I am a member of. For example: They run the world's largest open source project; whether you think their behavior is "abusive" or not is inconsequential. That said, you do have every right to choose how you want your community to be run. >> I don't give a shit about whether you think it is "absolutely wrong" >> or not > > This is completely inappropriate for any public mailing list. Apparently the LKML puts up with this kind of things, but most community driven projects do not. Making transcendental statements without evidence is getting us nowhere: I can't speak for communities other than the ones I've personally been involved with, and neither should you. It just so happens that I've been involved with the git community for a long time. Irrespective of whether I personally think my message was "appropriate" or not, I apologize because the LLVM community seems to be offended by it. >> Do you value contributors at all? (That's a rhetorical, because I >> already know the answer from the way you've been treating me: no) > > Yes, contributors are valuable, however contributors are members of a community and are expected to behave in a way that reflects this. There are a lot of people on this mailing list that I have argued with on technical matters, but none has ever felt the need to resort to profanity or personal attacks. You've conveniently ignored the facts, and focused all your attention on criticizing the tone of my last email. Want to write an essay about it? >> Do you care about getting LLVM to work with real-world codebases? >> (Again a rhetorical, because I already know the answer: no) > > You mean like iOS, OS X, or FreeBSD, which all use Clang/LLVM as their system compiler? Or perhaps the Android NDK, which ships Clang 3.1 in the latest release and has Clang 3.3 in trunk? Or Debian, which now tests building all packages with Clang and has a repository where they can be downloaded? I'm well aware of what LLVM powers, thank you. I've seen it up close in Rust and Rubinius. > Yes, we're quite interested in getting LLVM to work with real-world codebases. In many cases, we encounter a question of whether to support poor design choices in GCC that are used by a small number of packages, or impose something stricter, which benefits everyone in the long run in terms of code cleanups. Don't talk hypotheticals and make vague generalizations. I posted a patch, which nobody bothered to look at [1]. Those are the facts. > You view Linux as a very important package. This list has some Google people who might agree with you, but it also has a lot of Apple employees, and people like Joerg (who led the integration of Clang and NetBSD) and myself (a member of the FreeBSD Core Team), for whom it's just some third-party codebase with a reputation for relying on GCC-specific behaviour and containing code that relies on undefined behaviour and a vocal community that complains to GCC whenever they change their interpretation of undefined behaviour. If we can support it, that's great. If supporting it comes at a cost to other users of LLVM (either in terms of worse code, or less error reporting), then that's something that will need to be evaluated carefully on its merits, which is what the discussion that was already in progress was doing. More vague generalizations. I couldn't care less about who your employers are. I just want LLVM to get more users; period. Am I being overtly unreasonable, selfish, or demanding? [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063799.html From resistor at mac.com Wed Jul 17 01:26:40 2013 From: resistor at mac.com (Owen Anderson) Date: Wed, 17 Jul 2013 01:26:40 -0700 Subject: [LLVMdev] [PATCH v2] X86: disambiguate unqualified btr, bts In-Reply-To: References: <1373525227-5375-1-git-send-email-artagnon@gmail.com> <51E00EAB.24054.D7D0BE8@pageexec.gmail.com> <20130717063802.GA3258@britannica.bec.de> Message-ID: On Jul 17, 2013, at 12:54 AM, James Courtier-Dutton wrote: > On Jul 17, 2013 7:41 AM, "Joerg Sonnenberger" wrote: > > In case you have missed this, this is not LKML. Please keep your abusive > > language at home. Linus and hpa are no almighty authorities here and > > this is not the Linux kernel community. > > > > Linus is an expert on the x86 instruction set, so his advice on what to do with bts and btc should be taken seriously. So i thing that although Linus is not an LLVM authority, he should be considered an x86 authority. > There are many people with many years of experience with x86 on this mailing list, including compiler and assembler developers, kernel programmers, assembly-level performance experts, and instruction set architects. In addition to being familiar with LLVM's design and philosophy, they also represent dozens of interested parties, both open source and corporate. Nobody will deny that Linus is an x86 expert, but this list is full of them, and it is clear from the discussion that has been going on that not all of them are in agreement with him. --Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: From earthdok at google.com Wed Jul 17 01:33:43 2013 From: earthdok at google.com (Sergey Matveev) Date: Wed, 17 Jul 2013 12:33:43 +0400 Subject: [LLVMdev] Error building compiler-rt In-Reply-To: References: Message-ID: This is already fixed. On Wed, Jul 17, 2013 at 12:17 PM, Kostya Serebryany wrote: > +Sergey Matveev > > > On Wed, Jul 10, 2013 at 2:00 AM, Andy Jost wrote: > >> Ok, after familiarizing myself with clone it appears to me this is a >> bug in compiler-rt.**** >> >> ** ** >> >> From the clone man page:**** >> >> ** ** >> >> In Linux 2.4 and earlier, clone() does not take arguments ptid, tls, and >> ctid. **** >> >> ** ** >> >> The source file passes those arguments without any fencing to check the >> Linux version. Also, ptid, tls, and ctid are only used in conjunction with >> certain flags (e.g., CLONE_PARENT_SETTID), but none of those flags are set. >> **** >> >> ** ** >> >> It looks like the fix (for all Linux versions) would be to simply remove >> the last three arguments from the call.**** >> >> ** ** >> >> -Andy**** >> >> ** ** >> >> *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> *On Behalf Of *Andy Jost >> *Sent:* Tuesday, July 09, 2013 2:44 PM >> *To:* LLVMdev at cs.uiuc.edu >> *Subject:* [LLVMdev] Error building compiler-rt**** >> >> ** ** >> >> Hi,**** >> >> ** ** >> >> I get the following error while building compiler-rt:**** >> >> ** ** >> >> /slowfs/msret_s1_us03/ajost/src/llvm-3.3.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux.cc:315:22: >> error: no matching function for call to 'clone'**** >> >> pid_t tracer_pid = clone(TracerThread, tracer_stack.Bottom(),**** >> >> ^~~~~**** >> >> /usr/include/bits/sched.h:71:12: note: candidate function not viable: >> requires 4 arguments, but 7 were provided**** >> >> extern int clone (int (*__fn) (void *__arg), void *__child_stack,**** >> >> ** ** >> >> Inside sched.h, clone is indeed declared with four arguments, but, >> interestingly, the man page for clone provides this prototype:**** >> >> ** ** >> >> #include **** >> >> ** ** >> >> int clone(int (*fn)(void *), void *child_stack,**** >> >> int flags, void *arg, ...**** >> >> /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ );* >> *** >> >> ** ** >> >> I’m running RedHat EL 2.6.9-89.ELlargesmp without root privileges.**** >> >> ** ** >> >> Is this a bug in LLVM? Do I just have an old version of clone that’s not >> supported by LLVM? I can try just removing the last three arguments from >> the compiler-rt source, but is that the best solution? If someone can >> point out a clean way to fix this, then I don’t mind trying to contribute a >> patch (I would need to learn how).**** >> >> ** ** >> >> Also, is this something that autoconf should have detected? What should >> it have done about it?**** >> >> ** ** >> >> -Andy **** >> >> ** ** >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Wed Jul 17 02:12:29 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 17 Jul 2013 02:12:29 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson wrote: > > On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: > > On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet > wrote: > > ** Advices Needed ** > > 1. Decide whether or not we want such capabilities (if we do not we may > just > add sporadically the support for a new warning/group of warning/error). > 2. Come up with a plan to implement that (assuming we want it). > > > The frontend should be presenting warnings, not the backend; adding a > hook which provides the appropriate information shouldn't be too hard. > Warnings coming out of the backend are very difficult to design well, > so I don't expect we will add many. Also, keep in mind that the > information coming out of the backend could be used in other ways; it > might not make sense for the backend to decide that some piece of > information should be presented as a warning. (Consider, for example, > IDE integration to provide additional information about functions and > loops on demand.) > > > I think we definitely need this. In fact, I tried adding something simple > earlier this year but gave up when I realized that the task was bigger than > I expected. We already have a hook for diagnostics that can be easily > extended to handle warnings as well as errors (which is what I tried > earlier), but the problem is that it is hardwired for inline assembly > errors. To do this right, new warnings really need to be associated with > warning groups so that can be controlled from the front-end. > > I agree with Eli that there probably won’t be too many of these. Adding a > few new entries to clang’s diagnostic .td files would be fine, except that > the backend doesn’t see those. It seems like we will need something in > llvm that defines a set of “backend diagnostics”, along with a table in the > frontend to correlate those with the corresponding clang diagnostics. That > seems awkward at best but maybe it’s tolerable as long as there aren’t many > of them. > I actually think this is the wrong approach, and I don't think it's quite what Eli or I am suggestion (of course, Eli may want to clarify, I'm only really clarifying what *I'm* suggesting. I think all of the warnings should be in the frontend, using the standard and existing machinery for generating, controlling, and displaying a warning. We already know how to do that well. The difference is that these warnings will need to query the LLVM layer for detailed information through some defined API, and base the warning on this information. This accomplishes two things: 1) It ensures the warning machinery is simple, predictable, and integrates cleanly with everything else in Clang. It does so in the best way by simply being the existing machinery. 2) It forces us to design reasonable APIs in LLVM to expose to a FE for this information. A consequence of this will be to sort out the layering issues, etc. Another consequence will be a strong chance of finding general purpose APIs in LLVM that can serve many purposes, not just a warning. Consider JITs and other systems that might benefit from having good APIs for querying the size and makeup (at a high level) of a generated function. A nice side-effect is that it simplifies the complexity involved for simple warnings -- now it merely is the complexity of exposing the commensurately simple API in LLVM. If instead we go the route of threading a FE interface for *reporting* warnings into LLVM, we have to thread an interface with sufficient power to express many different concepts. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: From elena.demikhovsky at intel.com Wed Jul 17 03:20:07 2013 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Wed, 17 Jul 2013 10:20:07 +0000 Subject: [LLVMdev] Operand constrain specification In-Reply-To: <860743FD-8984-4458-9D65-79F001060D54@apple.com> References: <397121722.12373772.1373992344390.JavaMail.root@alcf.anl.gov> <860743FD-8984-4458-9D65-79F001060D54@apple.com> Message-ID: Thank you, it helps! - Elena From: Jim Grosbach [mailto:grosbach at apple.com] Sent: Tuesday, July 16, 2013 22:57 To: Hal Finkel; Demikhovsky, Elena Cc: LLVMdev List Subject: Re: [LLVMdev] Operand constrain specification On Jul 16, 2013, at 9:32 AM, Hal Finkel > wrote: ----- Original Message ----- Hi, How can I specify in a .td file that source and destination should not use the same register? I think that you can use the EarlyClobber operand flag to achieve this (TableGen has an @earlyclobber constraint; there are some examples in the ARM backend). Yes, that's exactly what that constraint is for. -Jim -Hal Thanks. * Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tscheller at apple.com Wed Jul 17 03:43:28 2013 From: tscheller at apple.com (Tilmann Scheller) Date: Wed, 17 Jul 2013 12:43:28 +0200 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5F130.1080400@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> <51E5ECDA.5000107@mips.com> <51E5F130.1080400@mips.com> Message-ID: <33D99FA7-D537-4F06-B5D9-29C4B98FDCDE@apple.com> Hi Reed, On Jul 17, 2013, at 3:19 AM, Reed Kotler wrote: > On 07/16/2013 06:01 PM, Reed Kotler wrote: >> The Eclipse indexer seems to get stuck in the Clang unittests/AST >> > > In Eclipse you can tell it that a given directory is derived, and then it won't try and index it. > > Probably the more complex clang tests are too involved for the indexer. Yeah, I vaguely remember having to exclude some clang tests from indexing because they take way too long/way too much memory to index. Regards, Tilmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Wed Jul 17 04:13:45 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 17 Jul 2013 13:13:45 +0200 Subject: [LLVMdev] General strategy to optimize LLVM IR In-Reply-To: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> References: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> Message-ID: <51E67C69.8010302@free.fr> Hi Stéphane, On 16/07/13 17:16, Stéphane Letz wrote: > Hi, > > Our DSL emit sub-optimal LLVM IR that we optimize later on (LLVM IR ==> LLVM IR) before dynamically compiling it with the JIT. We would like to simply follow what clang/clang++ does when compiling with -O1/-O2/-O3 options. Our strategy up to now what to look at the opt.cpp code and take part of it in order to implement our optimization code. > > It appears to be rather difficult to follow evolution of the LLVM IR optimization strategies. With LLVM 3.3 our optimization code does not produce code as fast as the one produced with clang -03 anymore. Moreover the new vectorizations passes are still not working. > > It there a recommended way to add -O1/-O2/-O3 kind of optimizations on LLVM IR code? Any code to look at beside the opt.cpp tool? the list of passes (and the flags that can be used to tweak it) is in lib/Transforms/IPO/PassManagerBuilder.cpp You can use the PassManagerBuilder to create your own pass list. However this is not enough to get good optimization, some more things are needed: 1) You must add DataLayout info to the module (using setDataLayout). For the vectorizer to do anything I think you are also obliged to add a target triple (using setTargetTriple); 2) In order to get vectorization you also have to add target specific analysis passes using addAnalysisPasses (see TargetMachine). Ciao, Duncan. From tscheller at apple.com Wed Jul 17 04:19:56 2013 From: tscheller at apple.com (Tilmann Scheller) Date: Wed, 17 Jul 2013 13:19:56 +0200 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E5B6C7.6000404@mips.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5B6C7.6000404@mips.com> Message-ID: On Jul 16, 2013, at 11:10 PM, Reed Kotler wrote: > > The source browsing is way better this way. Definitely! Once I used this for the first time I never wanted to go back to grep for source navigation, it’s so much faster :) > How are you setting up the debugger? > > For example, if you want to run from clang but debug the back end code generation ? I just create a new launch configuration specifying the binary/working directory/command line arguments and run it in debug mode. > BTW: do you do builds inside of eclipse. > Seems to be kind of slow. I actually never did a build with Eclipse, only used it for code navigation and debugging :) I do builds with Xcode from time to time when I want to debug from within Xcode (when I’m not using LLDB on the command line) because I haven’t figured out yet how to use Xcode to debug a binary which was built outside of Xcode. The experience is not that great either though, especially incremental building seems to be kind of broken. E.g. before launching the binary in the debugger it will always build the project first to make sure it’s up to date, so in theory if you run your binary twice and didn’t make any changes to the source code after the first run, then in the second run, you should only have an added overhead of determining that nothing has changed. However, in practice a significant amount of time is spent on just determining that nothing has changed and it also looks like stuff gets rebuilt even though there’s is no need to. I haven’t spent any time to track down what’s actually the problem there but I assume it has to do with the way CMake generates Xcode projects. It’s possible to start debugging without building though so it’s rather easy to workaround :) Regards, Tilmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From tscheller at apple.com Wed Jul 17 04:35:56 2013 From: tscheller at apple.com (Tilmann Scheller) Date: Wed, 17 Jul 2013 13:35:56 +0200 Subject: [LLVMdev] eclipse and gdb In-Reply-To: References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5B6C7.6000404@mips.com> Message-ID: <73F06A8B-BC28-443F-B921-831DA8A2B580@apple.com> On Jul 17, 2013, at 1:19 PM, Tilmann Scheller wrote: > I actually never did a build with Eclipse, only used it for code navigation and debugging :) Actually that’s not really true, I did build with Eclipse from time to time to get all the sources TableGen generates automatically. This is really nice because the source navigation works just fine across handwritten and automatically generated files. What I meant to say is that I have two build directories, one for Eclipse/Xcode to have a project file for source navigation and one for my regular build on the console which I use for all the actual development work. Regards, Tilmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at stefant.org Wed Jul 17 05:28:20 2013 From: stefan at stefant.org (Stefan Hepp) Date: Wed, 17 Jul 2013 12:28:20 +0000 Subject: [LLVMdev] Instantiating Target-Specifc ASM Parser In-Reply-To: References: Message-ID: <51E68DE4.5040609@stefant.org> Hi, We are using the AsmParser to analyse inline asm instructions in our backend, i.e. to get the actual size in bytes of an instruction or to find all calls,.. I assume this might be similar to what you want to do. I let AsmPrinter do all the hard work of parsing inline asm and instantiating everything. This might be a bit of an overkill though if you do not need inline-asm parsing (it replaces placeholders with proper register names, ...). For your case it might be sufficient to have a look at the code in AsmParser::EmitInlineAsm(StringRef Str, ..). You will need to have a proper context and MCStreamer object though.. Here is what I did to make this happen: 1) Extract a header for the MCNullStreamer class to include/llvm/MC to be able to make subclasses. I will attach the modified files for your convenience. 2) Make the EmitInlineAsm functions in AsmPrinter public. Move construction and deletion of the Mangler in AsmPrinter.cpp into the constructor/destructor of AsmPrinter. In AsmPrinter::EmitInlineAsm (AsmPrinterInlineAsm.cpp:93), add an 'if (MMI) { .. }' to skip the DiagnosticHandler stuff if MMI is not set. 3) Create a subclass of MCNullStreamer to visit all parsed instructions in the inline asm code, like so: class InstrAnalyzer : public MCNullStreamer { const MCInstrInfo &MII; unsigned size; public: InstrAnalyzer(MCContext &ctx) : MCNullStreamer(ctx), MII(ctx.getInstrInfo()), size(0) { } virtual void EmitInstruction(const MCInst &Inst) { const MCInstrDesc &MID = MII.get(Inst.getOpcode()); size += MID.getSize(); } }; 4) Create a new AsmPrinter and use it to parse and emit asm, e.g. like so (our backend is called Patmos, so do not wonder about this..): unsigned int PatmosInstrInfo::getInstrSize(const MachineInstr *MI) const { if (MI->isInlineAsm()) { // PTM is the TargetMachine // TODO is there a way to get the current context? MCContext Ctx(*PTM.getMCAsmInfo(), *PTM.getRegisterInfo(), *PTM.getInstrInfo(), 0); // PIA is deleted by AsmPrinter PatmosInstrAnalyzer *PIA = new InstrAnalyzer(Ctx); // PTM.getTargetLowering()->getObjFileLowering() might not yet be // initialized, so we create a new section object for this temp context const MCSection* TS = Ctx.getELFSection(".text", ELF::SHT_PROGBITS, 0, SectionKind::getText()); PIA->SwitchSection(TS); PatmosAsmPrinter PAP(PTM, *PIA); PAP.EmitInlineAsm(MI); return PIA->getSize(); } else if (MI->isBundle()) { // handle bundles.. unsigned size = ...; return size; } else { // trust the desc.. return MI->getDesc().getSize(); } } I tested this with LLVM 3.2, and it should work with LLVM 3.3 (though I am not finished testing it..). Cheers, Stefan Hepp On 07/16/2013 02:27 PM, Dunn, Kyle wrote: > Hello, > > I am working on backend development and would like to utilize my > target's MCAsmParser inside of an MCInst-level class implementation. I > noticed that the AsmParser is registered with the target registry > however I am having no luck grepping for a "template" of how to > instantiate it and have yet to find specific documentation on how it is > done. Any ideas or help is greatly appreciated! > > > Cheers, > Kyle Dunn > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- A non-text attachment was scrubbed... Name: MCNullStreamer.h Type: text/x-chdr Size: 4579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MCNullStreamer.cpp Type: text/x-c++src Size: 877 bytes Desc: not available URL: From stefan at stefant.org Wed Jul 17 06:08:24 2013 From: stefan at stefant.org (Stefan Hepp) Date: Wed, 17 Jul 2013 13:08:24 +0000 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <33D99FA7-D537-4F06-B5D9-29C4B98FDCDE@apple.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> <51E5ECDA.5000107@mips.com> <51E5F130.1080400@mips.com> <33D99FA7-D537-4F06-B5D9-29C4B98FDCDE@apple.com> Message-ID: <51E69748.7010003@stefant.org> Hi, I am using Eclipse to edit the files, and I used it to debug with gdb as well (but I went back to gdb on the command line, Eclipse GDB UI is just too slow and buggy for me ..). You need to exclude the clang preprocessor/.. stress tests from the sources, otherwise the indexer will freeze Eclipse. You should also remove some autogenerated CMake project subdirectories, otherwise your files will be opened multiple times when navigating the sources, which *will* cause you to loose/overwrite changes when you save. Be aware that building the project will rerun CMake if you make any changes to the CMake files, which will regenerate your Eclipse project files, i.e., your Resource Exclusion settings will be lost, i.e., the indexer will get stuck the next time again. You could maybe save the project files to a different directory, I personally just disabled building the project in Eclipse (I use a build directory separate from the Eclipse cmake build dir). Here is basically how I set up my Eclipse project: - Use File->Import->General->Import Existing Project. Do not check Copy Sources. - Setup Resource->Resource Filters in Project settings on project root, - Exclude */clang/test and */clang/INPUTS project relative paths (recursive) - Do not use 'Location', causes nullpointer exception on file save Make sure Indexer does not run before you create the filters! - Delete the [Targets] and [Subprojects] directories (or whatever they are called) - Setup project settings: - Setup coding style to new derivative from GNU, edit line-breaks settings - Setup Project Include-path settings: add following paths and defines: - /usr/include - /usr/include/c++/4.6 - /usr/include/c++/4.6/x86_64-linux-gnu - /usr/include/x86_64-linux-gnu - GET_REGINFO_MC_DESC, GET_REGINFO_HEADER, GET_REGINFO_TARGET_DESC, GET_ASSEMBLER_HEADER, GET_MATCHER_IMPLEMENTATION - Setup Texteditor Font to Deja Sans Mono or any other font that has same character width for normal and bold text! - Setup dictionary for 'C++ Spell Checker' General spell checker dictionary does not remember words. For debugging, you might want to consider the 'GDB (DSF)' launcher, which supports pretty printing. However, I found this way of running GDB a bit unstable (You may need to restart Eclipse if your Variable view stays empty during debugging), and back then I had some problems when compiling LLVM/clang as shared libraries. I made some initial efforts to create GDB pretty printing python scripts similar to STL pretty printing for the LLVM containers. It is far from complete, but it has initial support for some of the most common containers. I had some problems with pretty printing for STL in the past when I compiled LLVM with clang, but I think it works now with the latest clang/LLVM/gdb versions. You can find my code for pretty printing for GDB for LLVM together with some infos on how to setup Eclipse for debugging here, if you want to give it a go: https://github.com/t-crest/patmos-llvm/tree/master/utils/gdb Basically, you need to check out the 'python' folder somewhere, check out the 'gdbinit' file, modify the paths to your system and make your gdb load the gdbinit file (beware, the Eclipse Standard Process launcher and the GDB (DSF) launcher seem to behave differently regarding whether they load the ~/.gdbinit file by default or not). Cheers, Stefan On 07/17/2013 10:43 AM, Tilmann Scheller wrote: > Hi Reed, > > On Jul 17, 2013, at 3:19 AM, Reed Kotler > wrote: > >> On 07/16/2013 06:01 PM, Reed Kotler wrote: >>> The Eclipse indexer seems to get stuck in the Clang unittests/AST >>> >> >> In Eclipse you can tell it that a given directory is derived, and then >> it won't try and index it. >> >> Probably the more complex clang tests are too involved for the indexer. > Yeah, I vaguely remember having to exclude some clang tests from > indexing because they take way too long/way too much memory to index. > > Regards, > > Tilmann > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From letz at grame.fr Wed Jul 17 07:07:35 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Wed, 17 Jul 2013 16:07:35 +0200 Subject: [LLVMdev] General strategy to optimize LLVM IR In-Reply-To: References: <88AE2587-C876-4E7C-A4F9-C864367341E7@grame.fr> Message-ID: Le 16 juil. 2013 à 20:07, David Blaikie a écrit : > On Tue, Jul 16, 2013 at 8:16 AM, Stéphane Letz wrote: >> Hi, >> >> Our DSL emit sub-optimal LLVM IR that we optimize later on (LLVM IR ==> LLVM IR) before dynamically compiling it with the JIT. We would like to simply follow what clang/clang++ does when compiling with -O1/-O2/-O3 options. Our strategy up to now what to look at the opt.cpp code and take part of it in order to implement our optimization code. >> >> It appears to be rather difficult to follow evolution of the LLVM IR optimization strategies. With LLVM 3.3 our optimization code does not produce code as fast as the one produced with clang -03 anymore. Moreover the new vectorizations passes are still not working. >> >> It there a recommended way to add -O1/-O2/-O3 kind of optimizations on LLVM IR code? Any code to look at beside the opt.cpp tool? > > I'm not /entirely/ sure what you're asking. It sounds like you're > asking "what passes should my compiler's -O1/2/3 flag's correspond to" > and one answer to that is to look at Clang (I think Clang's is > different from opt/llc's, maybe). > After taking code from LLVM 3.3 opt.cpp tool, the LLVM IR optimizations now produce correctly optimized code (by comparing with what clang -O3 -emit-llvm and opt -O3 give). Then the LLVM IR is given to JIT, but now we see speedup regression compared to what we had with LLVM 3.1 (by comparing how clang -O3 does with a C version of our generated code and what is compiled using a LLVM IR ==> (optimizations passes) ==> LLVM IR ==> JIT. Our code basically does: EngineBuilder builder(fResult->fModule); builder.setOptLevel(CodeGenOpt::Aggressive); builder.setEngineKind(EngineKind::JIT); builder.setUseMCJIT(true); (I tried to add builder.setMCPU(llvm::sys::getHostCPUName()); without changes…) Is there any new things to "activate" in LLVM 3.3 to get similar speed results to what we had with LLVM 3.1? Thanks Stéphane Letz From rkotler at mips.com Wed Jul 17 07:08:54 2013 From: rkotler at mips.com (reed kotler) Date: Wed, 17 Jul 2013 07:08:54 -0700 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <73F06A8B-BC28-443F-B921-831DA8A2B580@apple.com> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5B6C7.6000404@mips.com> <73F06A8B-BC28-443F-B921-831DA8A2B580@apple.com> Message-ID: <51E6A576.4010501@mips.com> The latest release of Eclipse (which is Kepler) did not have any indexing problems. There were bugs in the older ones where the indexer went into an infinite loop sometimes. If you have a build with cmake, you can just type make at the command line there too. It seems that command line "make" within cmake areas is much faster than when building in an area using the traditional "configure". On 07/17/2013 04:35 AM, Tilmann Scheller wrote: > > On Jul 17, 2013, at 1:19 PM, Tilmann Scheller > wrote: > >> I actually never did a build with Eclipse, only used it for code >> navigation and debugging :) > Actually that’s not really true, I did build with Eclipse from time to > time to get all the sources TableGen generates automatically. This is > really nice because the source navigation works just fine across > handwritten and automatically generated files. > > What I meant to say is that I have two build directories, one for > Eclipse/Xcode to have a project file for source navigation and one for > my regular build on the console which I use for all the actual > development work. > > Regards, > > Tilmann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Wed Jul 17 07:15:01 2013 From: tobias at grosser.es (Tobias Grosser) Date: Wed, 17 Jul 2013 07:15:01 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <20130716184233.GA22630@codeaurora.org> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <20130716184233.GA22630@codeaurora.org> Message-ID: <51E6A6E5.6080305@grosser.es> On 07/16/2013 11:42 AM, Sebastian Pop wrote: > Star Tan wrote: >> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. >> Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. > > On top of your patch, I have removed from ScopDetection.cpp all printing of LLVM > values, like this: > > - INVALID(AffFunc, "Non affine access function: " << *AccessFunction); > + INVALID(AffFunc, "Non affine access function: "); > > there are a good dozen or so of these pretty printing. With these changes the > compile time spent in ScopDetection drops dramatically to almost 0: here is the > longest running one in the compilation of an Android stack: > > 2.1900 ( 13.7%) 0.0100 ( 7.7%) 2.2000 ( 13.6%) 2.2009 ( 13.4%) Polly - Detect static control parts (SCoPs) > > Before these changes, the top most expensive ScopDetection time used to be a few > hundred of seconds. Hi Sebastian, I am slightly confused. The patch of Star Tan did the following: #define INVALID(NAME, MESSAGE) \ do { \ - std::string Buf; \ - raw_string_ostream fmt(Buf); \ - fmt << MESSAGE; \ - fmt.flush(); \ - LastFailure = Buf; \ DEBUG(dbgs() << MESSAGE); \ DEBUG(dbgs() << "\n"); \ assert(!Context.Verifying &&#NAME); In my understanding, this patch alone removes all formatting overhead from the default execution. The only use of MESSAGE left is within the debug macro, which will only be evaluated if -debug is given. I am surprised why you see further performance changes, by removing/changing the content of MESSAGE. As it is not evaluated, I do not see why this would change performance. Do you have any ideas what is going on. I also tested for further performance differences on the oggenc benchmark and could not reproduce your results. Cheers, Tobias From bob.wilson at apple.com Wed Jul 17 08:53:33 2013 From: bob.wilson at apple.com (Bob Wilson) Date: Wed, 17 Jul 2013 08:53:33 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: On Jul 17, 2013, at 2:12 AM, Chandler Carruth wrote: > On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson wrote: > > On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: > >> On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: >>> ** Advices Needed ** >>> >>> 1. Decide whether or not we want such capabilities (if we do not we may just >>> add sporadically the support for a new warning/group of warning/error). >>> 2. Come up with a plan to implement that (assuming we want it). >> >> The frontend should be presenting warnings, not the backend; adding a >> hook which provides the appropriate information shouldn't be too hard. >> Warnings coming out of the backend are very difficult to design well, >> so I don't expect we will add many. Also, keep in mind that the >> information coming out of the backend could be used in other ways; it >> might not make sense for the backend to decide that some piece of >> information should be presented as a warning. (Consider, for example, >> IDE integration to provide additional information about functions and >> loops on demand.) > > I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end. > > I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them. > > I actually think this is the wrong approach, and I don't think it's quite what Eli or I am suggestion (of course, Eli may want to clarify, I'm only really clarifying what *I'm* suggesting. > > I think all of the warnings should be in the frontend, using the standard and existing machinery for generating, controlling, and displaying a warning. We already know how to do that well. The difference is that these warnings will need to query the LLVM layer for detailed information through some defined API, and base the warning on this information. This accomplishes two things: > > 1) It ensures the warning machinery is simple, predictable, and integrates cleanly with everything else in Clang. It does so in the best way by simply being the existing machinery. > > 2) It forces us to design reasonable APIs in LLVM to expose to a FE for this information. A consequence of this will be to sort out the layering issues, etc. Another consequence will be a strong chance of finding general purpose APIs in LLVM that can serve many purposes, not just a warning. Consider JITs and other systems that might benefit from having good APIs for querying the size and makeup (at a high level) of a generated function. > > A nice side-effect is that it simplifies the complexity involved for simple warnings -- now it merely is the complexity of exposing the commensurately simple API in LLVM. If instead we go the route of threading a FE interface for *reporting* warnings into LLVM, we have to thread an interface with sufficient power to express many different concepts. I don't understand what you are proposing. First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn't go far enough to make it work properly and it wasn't clear at the time whether we really needed it. Now let me try to understand what you're suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don't see how else you would do it, since clang's interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn't those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang's diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I'm misinterpreting your proposal. -------------- next part -------------- An HTML attachment was scrubbed... URL: From micah.villmow at smachines.com Wed Jul 17 09:31:19 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Wed, 17 Jul 2013 16:31:19 +0000 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE600708287E@smi-exchange1.smi.local> We had a similar problem with using LLVM on the GPU @ AMD as many times errors were not known until post-ISel/Resource Allocation. Our solution was to embed the errors in the resulting ISA and have the assembler/loader emit/error at that time. So this kind of API would be useful for more than just kernel development. Micah From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Quentin Colombet Sent: Tuesday, July 16, 2013 5:21 PM To: LLVM Developers Mailing List Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. Hi, I would like to start a discussion about error/warning reporting in LLVM and how we can extend the current mechanism to take advantage of clang capabilities. ** Motivation ** Currently LLVM provides a way to report error either directly (print to stderr) or by using a user defined error handler. For instance, in inline asm parsing, we can specify the diagnostic handler to report the errors in clang. The basic idea would be to be able to do that for warnings too (and for other kind of errors?). A motivating example can be found with the following link where we want LLVM to be able to warn on the stack size to help developing kernels: http://llvm.org/bugs/show_bug.cgi?id=4072 By adding this capability, we would be able to have access to all the nice features clang provides with warnings: - Promote it to an error. - Ignore it. ** Challenge ** To be able to take advantage of clang framework for warning/error reporting, warnings have to be associated with warning groups. Thus, we need a way for the backend to specify a front-end warning type. The challenge is, AFAICT (which is not much, I admit), that front-end warning types are statically handled using tablegen representation. ** Advices Needed ** 1. Decide whether or not we want such capabilities (if we do not we may just add sporadically the support for a new warning/group of warning/error). 2. Come up with a plan to implement that (assuming we want it). Thanks for the feedbacks. Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From evan.cheng at apple.com Wed Jul 17 09:38:38 2013 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 17 Jul 2013 09:38:38 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> Sent from my iPad On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: > > On Jul 17, 2013, at 2:12 AM, Chandler Carruth wrote: > >> On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson wrote: >>> >>> On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: >>> >>>> On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: >>>>> ** Advices Needed ** >>>>> >>>>> 1. Decide whether or not we want such capabilities (if we do not we may just >>>>> add sporadically the support for a new warning/group of warning/error). >>>>> 2. Come up with a plan to implement that (assuming we want it). >>>> >>>> The frontend should be presenting warnings, not the backend; adding a >>>> hook which provides the appropriate information shouldn't be too hard. >>>> Warnings coming out of the backend are very difficult to design well, >>>> so I don't expect we will add many. Also, keep in mind that the >>>> information coming out of the backend could be used in other ways; it >>>> might not make sense for the backend to decide that some piece of >>>> information should be presented as a warning. (Consider, for example, >>>> IDE integration to provide additional information about functions and >>>> loops on demand.) >>> >>> I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end. >>> >>> I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them. >> >> I actually think this is the wrong approach, and I don't think it's quite what Eli or I am suggestion (of course, Eli may want to clarify, I'm only really clarifying what *I'm* suggesting. >> >> I think all of the warnings should be in the frontend, using the standard and existing machinery for generating, controlling, and displaying a warning. We already know how to do that well. The difference is that these warnings will need to query the LLVM layer for detailed information through some defined API, and base the warning on this information. This accomplishes two things: >> >> 1) It ensures the warning machinery is simple, predictable, and integrates cleanly with everything else in Clang. It does so in the best way by simply being the existing machinery. >> >> 2) It forces us to design reasonable APIs in LLVM to expose to a FE for this information. A consequence of this will be to sort out the layering issues, etc. Another consequence will be a strong chance of finding general purpose APIs in LLVM that can serve many purposes, not just a warning. Consider JITs and other systems that might benefit from having good APIs for querying the size and makeup (at a high level) of a generated function. >> >> A nice side-effect is that it simplifies the complexity involved for simple warnings -- now it merely is the complexity of exposing the commensurately simple API in LLVM. If instead we go the route of threading a FE interface for *reporting* warnings into LLVM, we have to thread an interface with sufficient power to express many different concepts. > > I don't understand what you are proposing. > > First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn't go far enough to make it work properly and it wasn't clear at the time whether we really needed it. > > Now let me try to understand what you're suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don't see how else you would do it, since clang's interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn't those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang's diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I'm misinterpreting your proposal. We can't assume clang is the frontend or design a system that only works with clang. There are many systems that use llvm which are not even c compilers. Evan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From rsandifo at linux.vnet.ibm.com Wed Jul 17 09:52:16 2013 From: rsandifo at linux.vnet.ibm.com (Richard Sandiford) Date: Wed, 17 Jul 2013 17:52:16 +0100 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers Message-ID: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> I'm trying to add some instructions that are only available on certain processors. These instructions use context-dependent parsers. Everything works fine for the valid cases, but if you try to use an instruction on processors that don't support it, the asm parser says: /tmp/foo.s:1:2: error: invalid operands for instruction sllk %r2,%r3,1 ^ rather than: /tmp/foo.s:1:2: error: instruction requires: distinct-ops sllk %r2,%r3,1 ^ This is because MatchOperandParserImpl() skips custom parsers if the subtarget feature isn't enabled, so the instruction is parsed using the default operand parser instead. Then MatchInstructionImpl() only returns Match_MissingFeature if an otherwise good match is found, which in my case requires the custom parser to be used. ARM seems to rely on the current MatchOperandParserImpl() behaviour, so I'm not going to suggest changing it unconditionally. But on SystemZ there aren't any cases where the choice of parse routine depends on the enabled features. It'd be better just to parse in the same way regardless and check for errors at the end. The patch below does that by adding an optional argument to MatchOperandParserImpl(). It seems really ugly though. Does anyone have any better suggestions? Thanks, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: check-features.diff Type: text/x-patch Size: 1704 bytes Desc: not available URL: From t.p.northover at gmail.com Wed Jul 17 09:59:17 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 17 Jul 2013 17:59:17 +0100 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers In-Reply-To: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> References: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Message-ID: > /tmp/foo.s:1:2: error: instruction requires: distinct-ops > sllk %r2,%r3,1 > ^ That seems like it would be a good improvement for all targets. > ARM seems to rely on the current MatchOperandParserImpl() behaviour, > so I'm not going to suggest changing it unconditionally. Presumably you switched it and looked at what fell over; do you remember what kind of problems ARM had? Perhaps we can fix ARM so that your change works there too. Don't worry if not, I can try poking it myself based on your patch and see what happens. Cheers. Tim. From rsandifo at linux.vnet.ibm.com Wed Jul 17 10:26:14 2013 From: rsandifo at linux.vnet.ibm.com (Richard Sandiford) Date: Wed, 17 Jul 2013 18:26:14 +0100 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers In-Reply-To: (Tim Northover's message of "Wed, 17 Jul 2013 17:59:17 +0100") References: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Message-ID: <87k3kphzd5.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Tim Northover writes: >> /tmp/foo.s:1:2: error: instruction requires: distinct-ops >> sllk %r2,%r3,1 >> ^ > > That seems like it would be a good improvement for all targets. Thanks, sounds like it might be more acceptable than I thought :-) >> ARM seems to rely on the current MatchOperandParserImpl() behaviour, >> so I'm not going to suggest changing it unconditionally. > > Presumably you switched it and looked at what fell over; do you > remember what kind of problems ARM had? Perhaps we can fix ARM so that > your change works there too. Yeah, there were two new MC failures. The first was: /home/richards/llvm/build/Debug+Asserts/bin/llvm-mc -triple=thumbv7-apple-darwin -mcpu=cortex-a8 -show-encoding < /home/richards/llvm/src/test/MC/ARM/basic -thumb2-instructions.s | /home/richards/llvm/build/Debug+Asserts/bin/FileCheck /home/richards/llvm/src/test/MC/ARM/basic-thumb2-instructions.s -- Exit Code: 1 Command Output (stderr): -- :1356:9: error: instruction requires: armv7m mrs r8, apsr ^ :1357:9: error: instruction requires: armv7m mrs r8, cpsr ^ :1358:9: error: instruction requires: armv7m mrs r8, spsr ^ and the second was the same for basic-arm-instructions.s. The problem seems to be that the MSRMask parser is then always used, even for non-M-class. Richard From grosbach at apple.com Wed Jul 17 10:58:24 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 17 Jul 2013 10:58:24 -0700 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers In-Reply-To: <87k3kphzd5.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> References: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87k3kphzd5.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Message-ID: <755ECF6A-B38F-4782-B4D8-4F10B455375D@apple.com> On Jul 17, 2013, at 10:26 AM, Richard Sandiford wrote: > Tim Northover writes: >>> /tmp/foo.s:1:2: error: instruction requires: distinct-ops >>> sllk %r2,%r3,1 >>> ^ >> >> That seems like it would be a good improvement for all targets. > > Thanks, sounds like it might be more acceptable than I thought :-) FWIW, I'm the guy to blame for the current implementation and I like the idea. Getting it right may be marginally tricky, but the direction is good. Better diagnostics from the assemblers is a very good thing. > >>> ARM seems to rely on the current MatchOperandParserImpl() behaviour, >>> so I'm not going to suggest changing it unconditionally. >> >> Presumably you switched it and looked at what fell over; do you >> remember what kind of problems ARM had? Perhaps we can fix ARM so that >> your change works there too. > > Yeah, there were two new MC failures. The first was: > > /home/richards/llvm/build/Debug+Asserts/bin/llvm-mc -triple=thumbv7-apple-darwin -mcpu=cortex-a8 -show-encoding < /home/richards/llvm/src/test/MC/ARM/basic > -thumb2-instructions.s | /home/richards/llvm/build/Debug+Asserts/bin/FileCheck /home/richards/llvm/src/test/MC/ARM/basic-thumb2-instructions.s > -- > Exit Code: 1 > Command Output (stderr): > -- > :1356:9: error: instruction requires: armv7m > mrs r8, apsr > ^ > :1357:9: error: instruction requires: armv7m > mrs r8, cpsr > ^ > :1358:9: error: instruction requires: armv7m > mrs r8, spsr > ^ > > and the second was the same for basic-arm-instructions.s. The problem seems > to be that the MSRMask parser is then always used, even for non-M-class. This seems fixable. The custom parsers that are only valid for certain sub targets could easily have an explicit early-exit if the active sub target isn't what it's looking for. Would that be sufficient here? -Jim From t.p.northover at gmail.com Wed Jul 17 11:14:17 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 17 Jul 2013 18:14:17 +0000 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers In-Reply-To: <755ECF6A-B38F-4782-B4D8-4F10B455375D@apple.com> References: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87k3kphzd5.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <755ECF6A-B38F-4782-B4D8-4F10B455375D@apple.com> Message-ID: > This seems fixable. The custom parsers that are only valid for certain sub targets could easily have an explicit early-exit if the active sub target isn't what it's looking for. Would that be sufficient here? It doesn't look like there are different encodings for the same string so I don't think even that would be necessary. How about partitioning by MRS_APSR and MRS_OtherM instead of the current MRS_M and MRS_AR? I'll see if I can put a patch together. Tim. From grosbach at apple.com Wed Jul 17 11:26:30 2013 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 17 Jul 2013 11:26:30 -0700 Subject: [LLVMdev] Help with subtarget features and context-dependent asm parsers In-Reply-To: References: <87txjti0xr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87k3kphzd5.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <755ECF6A-B38F-4782-B4D8-4F10B455375D@apple.com> Message-ID: On Jul 17, 2013, at 11:14 AM, Tim Northover wrote: >> This seems fixable. The custom parsers that are only valid for certain sub targets could easily have an explicit early-exit if the active sub target isn't what it's looking for. Would that be sufficient here? > > It doesn't look like there are different encodings for the same string > so I don't think even that would be necessary. How about partitioning > by MRS_APSR and MRS_OtherM instead of the current MRS_M and MRS_AR? > No objection. I vaguely recall there were some encoding differences for the same input strings between the sub targets, which is what motivated the split. If that's a faulty recollection, then yeah, merging them sounds great. > I'll see if I can put a patch together. Cool. -Jim From qcolombet at apple.com Wed Jul 17 11:34:27 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Wed, 17 Jul 2013 11:34:27 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> Message-ID: <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Hi, Thanks all for the insight comments. Let me sum up at a high level what proposals we actually have (sorry if I misinterpreted or missed something, do not hesitate to correct me): 1. Make LLVM defines some APIs that exposes some internal information so that a front-end can use them to build-up some diagnostics. 2. Make LLVM builds up a diagnostic and let a front-end maps this diagnostic to the right warning/error group. In my opinion, both approaches are orthogonal and have different advantages based on what goal we are pursuing. To be more specific, with the first approach, front-end people can come up with new warnings/errors and diagnostics without having to modify the back-end. On the other hand, with the second approach, back-end people can emit new warnings/errors without having to modify the front-end (BTW, it would be nice if we come up at least in clang with a consistent way to pass down options for emitting back-end warning/error when applicable, i.e., without -mllvm but with -W). What people are thinking? Thanks again for sharing your thoughts, I appreciate. Cheers, -Quentin On Jul 17, 2013, at 9:38 AM, Evan Cheng wrote: > > > Sent from my iPad > > On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: > >> >> On Jul 17, 2013, at 2:12 AM, Chandler Carruth wrote: >> >>> On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson wrote: >>> >>> On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: >>> >>>> On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: >>>>> ** Advices Needed ** >>>>> >>>>> 1. Decide whether or not we want such capabilities (if we do not we may just >>>>> add sporadically the support for a new warning/group of warning/error). >>>>> 2. Come up with a plan to implement that (assuming we want it). >>>> >>>> The frontend should be presenting warnings, not the backend; adding a >>>> hook which provides the appropriate information shouldn't be too hard. >>>> Warnings coming out of the backend are very difficult to design well, >>>> so I don't expect we will add many. Also, keep in mind that the >>>> information coming out of the backend could be used in other ways; it >>>> might not make sense for the backend to decide that some piece of >>>> information should be presented as a warning. (Consider, for example, >>>> IDE integration to provide additional information about functions and >>>> loops on demand.) >>> >>> I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end. >>> >>> I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them. >>> >>> I actually think this is the wrong approach, and I don't think it's quite what Eli or I am suggestion (of course, Eli may want to clarify, I'm only really clarifying what *I'm* suggesting. >>> >>> I think all of the warnings should be in the frontend, using the standard and existing machinery for generating, controlling, and displaying a warning. We already know how to do that well. The difference is that these warnings will need to query the LLVM layer for detailed information through some defined API, and base the warning on this information. This accomplishes two things: >>> >>> 1) It ensures the warning machinery is simple, predictable, and integrates cleanly with everything else in Clang. It does so in the best way by simply being the existing machinery. >>> >>> 2) It forces us to design reasonable APIs in LLVM to expose to a FE for this information. A consequence of this will be to sort out the layering issues, etc. Another consequence will be a strong chance of finding general purpose APIs in LLVM that can serve many purposes, not just a warning. Consider JITs and other systems that might benefit from having good APIs for querying the size and makeup (at a high level) of a generated function. >>> >>> A nice side-effect is that it simplifies the complexity involved for simple warnings -- now it merely is the complexity of exposing the commensurately simple API in LLVM. If instead we go the route of threading a FE interface for *reporting* warnings into LLVM, we have to thread an interface with sufficient power to express many different concepts. >> >> I don't understand what you are proposing. >> >> First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn't go far enough to make it work properly and it wasn't clear at the time whether we really needed it. >> >> Now let me try to understand what you're suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don't see how else you would do it, since clang's interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn't those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang's diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I'm misinterpreting your proposal. > > We can't assume clang is the frontend or design a system that only works with clang. There are many systems that use llvm which are not even c compilers. > > Evan > >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnovillo at google.com Wed Jul 17 12:35:41 2013 From: dnovillo at google.com (Diego Novillo) Date: Wed, 17 Jul 2013 12:35:41 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E087E2.5040101@gmail.com> References: <51E087E2.5040101@gmail.com> Message-ID: On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang wrote: > 3. How to parallelize post-IPO stage > ==================================== > > From 5k' high, the concept is very simple, just to > step 1).divide the merged IR into small pieces, > step 2).and compile each of this pieces independendly. > step 3) the objects of each piece are fed back to linker to are linked > into an executable, or a dynamic lib. You seem to be describing GCC's strategy for whole program optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization). What is a bit confusing in your description is that you seem to be wanting to do more work after IPO is *done*. If the optimizations are done, then there is no need to parallelize anything. In GCC, we have two parallel stages: the generation of bytecode (which is naturally parallelized by your build system) and the final optimization phase, which is parallelized by partitioning the callgraph into disjoint subsets. These subsets are then parallelized by spawning the compiler on each of the partitions. The only sequential part is the actual analysis (what we call WPA or whole program analysis). That is a single threaded phase that makes all the optimization decisions, annotates the summaries on each callgraph node, partitions the callgraph and then spawns all the optimizers to work on each section of the callgraph. Diego. From shuxin.llvm at gmail.com Wed Jul 17 13:06:49 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 17 Jul 2013 13:06:49 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> Message-ID: <51E6F959.3040103@gmail.com> On 7/17/13 12:35 PM, Diego Novillo wrote: > On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang wrote: > >> 3. How to parallelize post-IPO stage >> ==================================== >> >> From 5k' high, the concept is very simple, just to >> step 1).divide the merged IR into small pieces, >> step 2).and compile each of this pieces independendly. >> step 3) the objects of each piece are fed back to linker to are linked >> into an executable, or a dynamic lib. > You seem to be describing GCC's strategy for whole program > optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization). Hi, Diego: Thank you for the comment. Quite honestly, I'm not very familiar with gcc's LTO implementation, and now I'm not allowed to read any GPL V3 code. What I'm describing here is somewhat similar to Open64's IPA implementation. I did port gcc before and of course read its document . Based on my little knowledge of its internal, I think gcc's "whole-program" mode is almost identical to Open64's IPA in spirit, I guess gcc community likely borrowed some idea from Open64. On the other hand, as far as I can understand of the oldish gcc's implementation before I join Apple, I guess the "whole-program mode" in gcc is misnomer. I don't want call this work as "whole-program mode", I'd like to call "big program mode" or something like that, as I don't care the binary being built see the whole-program or not while the analyses/opt do care the difference between them. > > What is a bit confusing in your description is that you seem to be > wanting to do more work after IPO is *done*. IPO is difficult to be parallelized. What I try to do is to parallelize the post-lto compilation stage, including, optionally LNO, and scalar opt and compile-time hogging CodeGen. > If the optimizations are > done, then there is no need to parallelize anything. > > In GCC, we have two parallel stages: the generation of bytecode (which > is naturally parallelized by your build system) and the final > optimization phase, which is parallelized by partitioning the > callgraph into disjoint subsets. These subsets are then parallelized > by spawning the compiler on each of the partitions. I call the 1st "parallel stage" pre-ipo, and the 2nd "parallel stage" post-IPO:-), and the IPO (which you call WPA) is sandwiched in the middle. > > The only sequential part is the actual analysis (what we call WPA or > whole program analysis). While not include in the proposal, I was proposing another level (earlier) of partition in order to build outrageously huge programs in order to parallelize the "WPA". The benefit is to parallize the IPO/"WPA", however, the cost is not seeing the "whole program". One of my coworker told me we care about the benefit reaped from seeing the "whole program" than the improved compile-time at the cost of compiling bunch of "incomplete program"s independently. > That is a single threaded phase that makes > all the optimization decisions, annotates the summaries on each > callgraph node, partitions the callgraph and then spawns all the > optimizers to work on each section of the callgraph. > This is what I called "post-IPO" :-) From echristo at gmail.com Wed Jul 17 13:16:17 2013 From: echristo at gmail.com (Eric Christopher) Date: Wed, 17 Jul 2013 13:16:17 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E6F959.3040103@gmail.com> References: <51E087E2.5040101@gmail.com> <51E6F959.3040103@gmail.com> Message-ID: On Wed, Jul 17, 2013 at 1:06 PM, Shuxin Yang wrote: > > On 7/17/13 12:35 PM, Diego Novillo wrote: >> >> On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang >> wrote: >> >>> 3. How to parallelize post-IPO stage >>> ==================================== >>> >>> From 5k' high, the concept is very simple, just to >>> step 1).divide the merged IR into small pieces, >>> step 2).and compile each of this pieces independendly. >>> step 3) the objects of each piece are fed back to linker to are >>> linked >>> into an executable, or a dynamic lib. >> >> You seem to be describing GCC's strategy for whole program >> optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization). > > > Hi, Diego: > > Thank you for the comment. Quite honestly, I'm not very familiar with > gcc's LTO implementation, and now I'm not allowed to read any GPL V3 code. > > What I'm describing here is somewhat similar to Open64's IPA implementation. > I did port gcc before and of course read its document . Based on my > little knowledge of its internal, I think gcc's "whole-program" > mode is almost identical to Open64's IPA in spirit, I guess gcc community > likely > borrowed some idea from Open64. > > On the other hand, as far as I can understand of the oldish gcc's > implementation > before I join Apple, I guess the "whole-program mode" in gcc is misnomer. > I don't want call this work as "whole-program mode", I'd like to call "big > program mode" > or something like that, as I don't care the binary being built see the > whole-program or > not while the analyses/opt > do care the difference between them. > > >> >> What is a bit confusing in your description is that you seem to be >> wanting to do more work after IPO is *done*. > > > IPO is difficult to be parallelized. What I try to do is to parallelize the > post-lto compilation > stage, including, optionally LNO, and scalar opt and compile-time hogging > CodeGen. > > >> If the optimizations are >> done, then there is no need to parallelize anything. >> >> In GCC, we have two parallel stages: the generation of bytecode (which >> is naturally parallelized by your build system) and the final >> optimization phase, which is parallelized by partitioning the >> callgraph into disjoint subsets. These subsets are then parallelized >> by spawning the compiler on each of the partitions. > > I call the 1st "parallel stage" pre-ipo, and the 2nd "parallel stage" > post-IPO:-), > and the IPO (which you call WPA) is sandwiched in the middle. > > >> >> The only sequential part is the actual analysis (what we call WPA or >> whole program analysis). > > While not include in the proposal, I was proposing another level (earlier) > of partition in order > to build outrageously huge programs in order to parallelize the "WPA". The > benefit > is to parallize the IPO/"WPA", however, the cost is not seeing the "whole > program". > > One of my coworker told me we care about the benefit reaped from seeing the > "whole program" > than the improved compile-time at the cost of compiling bunch of "incomplete > program"s independently. > > > >> That is a single threaded phase that makes >> all the optimization decisions, annotates the summaries on each >> callgraph node, partitions the callgraph and then spawns all the >> optimizers to work on each section of the callgraph. >> > This is what I called "post-IPO" :-) > OK, yeah, this was horribly confusing. IPA would probably have been a better choice of term (or WPA) with the 'A' for analysis and the 'O' for optimization where we're doing the work. It makes your entire proposal make some sense now ;) I'll need to go back and reread it with that in mind. -eric From xinliangli at gmail.com Wed Jul 17 13:45:43 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Wed, 17 Jul 2013 13:45:43 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> <51E6F959.3040103@gmail.com> Message-ID: There are so many related terminologies that are usually confused or misused. It should be normalized before discussion. Here is one version: o IPA --> interprocedural analysis and optimization o IPO/CMO --> interprocedural optimization (cross module IPA). In different context, IPO may mean slightly different things. For instance, it may mean the whole compilation pipeline, or just the serial part when all IR files are merged and analyzed. IPO/CMO can operate both whole program mode and non-whole program mode. Note that IPA is more general than IPO/CMO, as it covers single module case tool. o LTO --> one type of implementation of IPO/CMO, which involves linker/linker plugin. It means the whole IPO pipeline. The following are a list of GCC specific terms: o WHOPR --> GCC LTO in whole program mode o WPA --> the serial/merge part of the IPO pipelines o LGEN --> pre-IPO compilation o LTRANS -> post-IPO compilation. Shuxin's proposal is about parallelizing 'LTRANS' in LLVM. thanks, David On Wed, Jul 17, 2013 at 1:16 PM, Eric Christopher wrote: > On Wed, Jul 17, 2013 at 1:06 PM, Shuxin Yang wrote: >> >> On 7/17/13 12:35 PM, Diego Novillo wrote: >>> >>> On Fri, Jul 12, 2013 at 3:49 PM, Shuxin Yang >>> wrote: >>> >>>> 3. How to parallelize post-IPO stage >>>> ==================================== >>>> >>>> From 5k' high, the concept is very simple, just to >>>> step 1).divide the merged IR into small pieces, >>>> step 2).and compile each of this pieces independendly. >>>> step 3) the objects of each piece are fed back to linker to are >>>> linked >>>> into an executable, or a dynamic lib. >>> >>> You seem to be describing GCC's strategy for whole program >>> optimization (http://gcc.gnu.org/wiki/LinkTimeOptimization). >> >> >> Hi, Diego: >> >> Thank you for the comment. Quite honestly, I'm not very familiar with >> gcc's LTO implementation, and now I'm not allowed to read any GPL V3 code. >> >> What I'm describing here is somewhat similar to Open64's IPA implementation. >> I did port gcc before and of course read its document . Based on my >> little knowledge of its internal, I think gcc's "whole-program" >> mode is almost identical to Open64's IPA in spirit, I guess gcc community >> likely >> borrowed some idea from Open64. >> >> On the other hand, as far as I can understand of the oldish gcc's >> implementation >> before I join Apple, I guess the "whole-program mode" in gcc is misnomer. >> I don't want call this work as "whole-program mode", I'd like to call "big >> program mode" >> or something like that, as I don't care the binary being built see the >> whole-program or >> not while the analyses/opt >> do care the difference between them. >> >> >>> >>> What is a bit confusing in your description is that you seem to be >>> wanting to do more work after IPO is *done*. >> >> >> IPO is difficult to be parallelized. What I try to do is to parallelize the >> post-lto compilation >> stage, including, optionally LNO, and scalar opt and compile-time hogging >> CodeGen. >> >> >>> If the optimizations are >>> done, then there is no need to parallelize anything. >>> >>> In GCC, we have two parallel stages: the generation of bytecode (which >>> is naturally parallelized by your build system) and the final >>> optimization phase, which is parallelized by partitioning the >>> callgraph into disjoint subsets. These subsets are then parallelized >>> by spawning the compiler on each of the partitions. >> >> I call the 1st "parallel stage" pre-ipo, and the 2nd "parallel stage" >> post-IPO:-), >> and the IPO (which you call WPA) is sandwiched in the middle. >> >> >>> >>> The only sequential part is the actual analysis (what we call WPA or >>> whole program analysis). >> >> While not include in the proposal, I was proposing another level (earlier) >> of partition in order >> to build outrageously huge programs in order to parallelize the "WPA". The >> benefit >> is to parallize the IPO/"WPA", however, the cost is not seeing the "whole >> program". >> >> One of my coworker told me we care about the benefit reaped from seeing the >> "whole program" >> than the improved compile-time at the cost of compiling bunch of "incomplete >> program"s independently. >> >> >> >>> That is a single threaded phase that makes >>> all the optimization decisions, annotates the summaries on each >>> callgraph node, partitions the callgraph and then spawns all the >>> optimizers to work on each section of the callgraph. >>> >> This is what I called "post-IPO" :-) >> > > OK, yeah, this was horribly confusing. IPA would probably have been a > better choice of term (or WPA) with the 'A' for analysis and the 'O' > for optimization where we're doing the work. > > It makes your entire proposal make some sense now ;) > > I'll need to go back and reread it with that in mind. > > -eric > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From kledzik at apple.com Wed Jul 17 16:12:59 2013 From: kledzik at apple.com (Nick Kledzik) Date: Wed, 17 Jul 2013 16:12:59 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> Message-ID: On Jul 14, 2013, at 7:07 PM, Andrew Trick wrote: > The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality. I also want to chime in on the importance of stable binary outputs. And not just same compiler and same sources produces same binary, but that minor changes to either should cause minor changes to the output binary. For software updates, Apple updater tries to download only the delta to the binaries, so we want those to be as small as possible. In addition, it often happens late in an OS release cycle that some critical bug is found and the fix is in the compiler. To qualify it, we rebuild the whole OS with the new compiler, then compare all the binaries in the OS, making sure only things related to the bug are changed. > Sorry for the tangential thought here... it seems that most of Shuxin’s proposal is actually independent of LTO, even though the prototype and primary goal is enabling LTO. This is very insightful, Andrew! Rather than think of this (post-IPO parallelization) as an LTO enhancement, it should be that the backend simply has some threshold (e.g. number of functions) which causes it to start parallelizing the last steps. On Jul 12, 2013, at 3:49 PM, Shuxin Yang wrote: > There are two camps: one camp advocate compiling partitions via multi-process, > the other one favor multi-thread. There is also a variant of multi-threading that is popular at Apple. Our OSs have libdispatch which makes is easy to queue up chucks of work. The OS looks at the overall system balance and uses the ideal number of threads to process the work queue. > The compiler used to generate a single object file from the merged > IR, now it will generate multiple of them, one for each partition. I have not studied the MC interface, but why does each partition need to generate a separate object file? Why can’t the first partition done create an object file, and as other partitions finish, they just append to that object file? -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Wed Jul 17 16:29:11 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 17 Jul 2013 16:29:11 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> Message-ID: <51E728C7.1080908@gmail.com> On 7/17/13 4:12 PM, Nick Kledzik wrote: > On Jul 14, 2013, at 7:07 PM, Andrew Trick > wrote: >> The partitioning should be deterministic. It’s just that the linker >> output now depends on the partitioning heuristics. As long that >> decision is based on the input (not the host system), then it still >> meets Eric’s requirements. I just think it’s unfortunate that >> post-IPO partitioning (or more generally, parallel codegen) affects >> the output, but may be hard to avoid. It would be nice to be able to >> tune the partitioning for compile time without worrying about code >> quality. > I also want to chime in on the importance of stable binary outputs. > And not just same compiler and same sources produces same binary, but > that minor changes to either should cause minor changes to the output > binary. For software updates, Apple updater tries to download only > the delta to the binaries, so we want those to be as small as > possible. In addition, it often happens late in an OS release cycle > that some critical bug is found and the fix is in the compiler. To > qualify it, we rebuild the whole OS with the new compiler, then > compare all the binaries in the OS, making sure only things related to > the bug are changed. > We can view partitioning as a "transformation". Unless the transformation is absolutely no-op, it will change something. If we care the consistency in binaries, we either consistently use partition or consistently not use partition. > >> The compiler used to generate a single object file from the merged >> IR, now it will generate multiple of them, one for each partition. > I have not studied the MC interface, but why does each partition need > to generate a separate object file? Why can’t the first partition > done create an object file, and as other partitions finish, they just > append to that object file? We could append the object files as an alternative. However, how do we know the /path/to/ld from the existing interface ABIs? How do we know the flags feed to the ld (more often than not, "-r" alone is enough, but some linkers may need more). In my rudimentary implement, I hack by hardcoding to /usr/bin/ld. I think adding object one by one back to the linker is better as the linker already have enough information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kledzik at apple.com Wed Jul 17 16:53:36 2013 From: kledzik at apple.com (Nick Kledzik) Date: Wed, 17 Jul 2013 16:53:36 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <51E728C7.1080908@gmail.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> <51E728C7.1080908@gmail.com> Message-ID: <9B50D605-1391-49A6-A888-532D4DAF592D@apple.com> On Jul 17, 2013, at 4:29 PM, Shuxin Yang wrote: > On 7/17/13 4:12 PM, Nick Kledzik wrote: >> On Jul 14, 2013, at 7:07 PM, Andrew Trick wrote: >>> The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality. >> I also want to chime in on the importance of stable binary outputs. And not just same compiler and same sources produces same binary, but that minor changes to either should cause minor changes to the output binary. For software updates, Apple updater tries to download only the delta to the binaries, so we want those to be as small as possible. In addition, it often happens late in an OS release cycle that some critical bug is found and the fix is in the compiler. To qualify it, we rebuild the whole OS with the new compiler, then compare all the binaries in the OS, making sure only things related to the bug are changed. >> > We can view partitioning as a "transformation". Unless the transformation is absolutely no-op, > it will change something. If we care the consistency in binaries, we either consistently use partition > or consistently not use partition. But doesn’t "consistently not use partition” mean “don’t use the optimization you are working on”? Isn’t there someone to get the same output no matter how it is partitioned? > >> >>> The compiler used to generate a single object file from the merged >>> IR, now it will generate multiple of them, one for each partition. >> >> I have not studied the MC interface, but why does each partition need to generate a separate object file? Why can’t the first partition done create an object file, and as other partitions finish, they just append to that object file? > > We could append the object files as an alternative. > However, how do we know the /path/to/ld from the existing interface ABIs? > How do we know the flags feed to the ld (more often than not, "-r" alone is enough, > but some linkers may need more). > > In my rudimentary implement, I hack by hardcoding to /usr/bin/ld. > > I think adding object one by one back to the linker is better as the linker already have > enough information. I think you missed my point, or you are really thinking from the multi-process point of view. In LLVM there is an MCWriter used to produce object files. Your model is that if there are three partitions, then there will be three MCWriter objects, each producing an object file. What I am saying is to have only one MCWriter object and have all three partitions stream their content out through the one MCWriter, producing one object file. -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Wed Jul 17 17:03:23 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 17 Jul 2013 17:03:23 -0700 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <9B50D605-1391-49A6-A888-532D4DAF592D@apple.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> <51E728C7.1080908@gmail.com> <9B50D605-1391-49A6-A888-532D4DAF592D@apple.com> Message-ID: <51E730CB.20303@gmail.com> On 7/17/13 4:53 PM, Nick Kledzik wrote: > > On Jul 17, 2013, at 4:29 PM, Shuxin Yang > wrote: > >> On 7/17/13 4:12 PM, Nick Kledzik wrote: >>> On Jul 14, 2013, at 7:07 PM, Andrew Trick >> > wrote: >>>> The partitioning should be deterministic. It’s just that the linker >>>> output now depends on the partitioning heuristics. As long that >>>> decision is based on the input (not the host system), then it still >>>> meets Eric’s requirements. I just think it’s unfortunate that >>>> post-IPO partitioning (or more generally, parallel codegen) affects >>>> the output, but may be hard to avoid. It would be nice to be able >>>> to tune the partitioning for compile time without worrying about >>>> code quality. >>> I also want to chime in on the importance of stable binary outputs. >>> And not just same compiler and same sources produces same binary, >>> but that minor changes to either should cause minor changes to the >>> output binary. For software updates, Apple updater tries to >>> download only the delta to the binaries, so we want those to be as >>> small as possible. In addition, it often happens late in an OS >>> release cycle that some critical bug is found and the fix is in the >>> compiler. To qualify it, we rebuild the whole OS with the new >>> compiler, then compare all the binaries in the OS, making sure only >>> things related to the bug are changed. >>> >> We can view partitioning as a "transformation". Unless the >> transformation is absolutely no-op, >> it will change something. If we care the consistency in binaries, >> we either consistently use partition >> or consistently not use partition. > But doesn’t "consistently not use partition” mean “don’t use the > optimization you are working on”? Yes > Isn’t there someone to get the same output no matter how it is > partitioned? No. Just like "cc --without-inline" and "cc --with-inline" yields diff binaries. > > >> >>> >>>> The compiler used to generate a single object file from the merged >>>> IR, now it will generate multiple of them, one for each partition. >>> I have not studied the MC interface, but why does each partition >>> need to generate a separate object file? Why can’t the first >>> partition done create an object file, and as other partitions >>> finish, they just append to that object file? >> >> We could append the object files as an alternative. >> However, how do we know the /path/to/ld from the existing interface ABIs? >> How do we know the flags feed to the ld (more often than not, "-r" >> alone is enough, >> but some linkers may need more). >> >> In my rudimentary implement, I hack by hardcoding to /usr/bin/ld. >> >> I think adding object one by one back to the linker is better as the >> linker already have >> enough information. > I think you missed my point, or you are really thinking from the > multi-process point of view. In LLVM there is an MCWriter used to > produce object files. Your model is that if there are three > partitions, then there will be three MCWriter objects, each producing > an object file. What I am saying is to have only one MCWriter object > and have all three partitions stream their content out through the one > MCWriter, producing one object file. > No, I don't want to sync all threads at this point. It make things all the more complex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj14889 at yahoo.com Wed Jul 17 17:11:24 2013 From: aj14889 at yahoo.com (Ali Javadi) Date: Wed, 17 Jul 2013 20:11:24 -0400 Subject: [LLVMdev] Nested Loop Unrolling Message-ID: <3A95D635-6EAA-484D-965E-A9DDFA1940D4@yahoo.com> Hi, In LLVM (using the opt tool), is it possible to force a nested loop be unrolled entirely? Something like a pass option? I have a nested loop with depth of 4, and all trip counts are known at compile time, but so far I've only been able to do this by 4 invocations of the -loop-simplify, -loop-rotate, -loop-unroll passes. Thanks, Ali From gribozavr at gmail.com Wed Jul 17 17:22:52 2013 From: gribozavr at gmail.com (Dmitri Gribenko) Date: Wed, 17 Jul 2013 17:22:52 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On Thu, Jul 4, 2013 at 8:56 PM, Rafael Espíndola wrote: > We currently don't use pipefail when running test under make check. > This has the undesirable property that it is really easy for tests to > bitrot. Hi Rafael, Did this discussion ever get a conclusion? I support enabling pipefail. Fallout for out of tree users should be easy to fix. As we learned from LLVM tests, almost all tests that start to fail actually indicate a real problem that was hidden. Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j*/ From silvas at purdue.edu Wed Jul 17 18:04:08 2013 From: silvas at purdue.edu (Sean Silva) Date: Wed, 17 Jul 2013 18:04:08 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> Message-ID: There seems to be a lot of interest recently in LTO. How do you see the situation of splitting the IR passes between per-TU processing and multi-TU ("link time") processing? -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at pcc.me.uk Wed Jul 17 18:06:09 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Wed, 17 Jul 2013 18:06:09 -0700 Subject: [LLVMdev] Proposal: function prefix data Message-ID: <20130718010609.GA17472@pcc.me.uk> Hi, I would like to propose that we introduce a mechanism in IR to allow arbitrary data to be stashed before a function body. The purpose of this would be to allow additional data about a function to be looked up via a function pointer. Two use cases come to mind: 1) We'd like to be able to use UBSan to check that the type of the function pointer of an indirect function call matches the type of the function being called. This can't really be done efficiently without storing type information near the function. 2) Allowing GHC's tables-next-to-code ABI [1] to be implemented. In general, I imagine this feature could be useful for the implementation of languages which require runtime metadata for each function. The proposal is that an IR function definition acquires a constant operand which contains the data to be emitted immediately before the function body (known as the prefix data). To access the data for a given function, a program may bitcast the function pointer to a pointer to the constant's type. This implies that the IR symbol points to the start of the prefix data. To maintain the semantics of ordinary function calls, the prefix data must have a particular format. Specifically, it must begin with a sequence of bytes which decode to a sequence of machine instructions, valid for the module's target, which transfer control to the point immediately succeeding the prefix data, without performing any other visible action. This allows the inliner and other passes to reason about the semantics of the function definition without needing to reason about the prefix data. Obviously this makes the format of the prefix data highly target dependent. This requirement could be relaxed when combined with my earlier symbol offset proposal [2] as applied to functions. However, this is outside the scope of the current proposal. Example: %0 = type <{ i32, i8* }> define void @f() prefix %0 <{ i32 1413876459, i8* bitcast ({ i8*, i8* }* @_ZTIFvvE to i8*) }> { ret void } This is an example of something that UBSan might generate on an x86_64 machine. It consists of a signature of 4 bytes followed by a pointer to the RTTI data for the type 'void ()'. The signature when laid out as a little endian 32-bit integer decodes to the instruction 'jmp .+0x0c' (which jumps to the instruction immediately succeeding the 12-byte prefix) followed by the bytes 'F' and 'T' which identify the prefix as a UBSan function type prefix. A caller might check that a given function pointer has a valid signature like this: %4 = bitcast void ()* @f to %0* %5 = getelementptr %0* %4, i32 0, i32 0 %6 = load i32* %5 %7 = icmp eq i32 %6, 1413876459 In the specific case above, where the function pointer is a constant, optimisation passes such as globalopt could potentially be adapted to recognise prefix data and hence replace %6 etc with a constant. (This is one reason why I decided to represent prefix data in IR rather than, say, using inline asm as proposed in the GHC thread [1].) Thoughts? Thanks, -- Peter [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047550.html [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html From rafael.espindola at gmail.com Wed Jul 17 18:48:22 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Wed, 17 Jul 2013 21:48:22 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: > Hi Rafael, > > Did this discussion ever get a conclusion? I support enabling > pipefail. Fallout for out of tree users should be easy to fix. As we > learned from LLVM tests, almost all tests that start to fail actually > indicate a real problem that was hidden. So far I got some positive feedback, but no strong LGTM from someone in the area :-( > Dmitri > Cheers, Rafael From xiaofei.wan at intel.com Wed Jul 17 18:56:57 2013 From: xiaofei.wan at intel.com (Wan, Xiaofei) Date: Thu, 18 Jul 2013 01:56:57 +0000 Subject: [LLVMdev] [Proposal] Parallelize post-IPO stage. In-Reply-To: <9B50D605-1391-49A6-A888-532D4DAF592D@apple.com> References: <51E087E2.5040101@gmail.com> <0DFE5B55-3FC3-435C-9971-040ED72D0328@apple.com> <7557C3F3-0682-479F-B93D-07EB2BDC16D0@apple.com> <51E728C7.1080908@gmail.com> <9B50D605-1391-49A6-A888-532D4DAF592D@apple.com> Message-ID: <851E09B5CA368045827A32DA76E440AF01972E6C@SHSMSX104.ccr.corp.intel.com> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Kledzik Sent: Thursday, July 18, 2013 7:54 AM To: Shuxin Yang Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] [Proposal] Parallelize post-IPO stage. On Jul 17, 2013, at 4:29 PM, Shuxin Yang > wrote: On 7/17/13 4:12 PM, Nick Kledzik wrote: On Jul 14, 2013, at 7:07 PM, Andrew Trick > wrote: The partitioning should be deterministic. It's just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric's requirements. I just think it's unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality. I also want to chime in on the importance of stable binary outputs. And not just same compiler and same sources produces same binary, but that minor changes to either should cause minor changes to the output binary. For software updates, Apple updater tries to download only the delta to the binaries, so we want those to be as small as possible. In addition, it often happens late in an OS release cycle that some critical bug is found and the fix is in the compiler. To qualify it, we rebuild the whole OS with the new compiler, then compare all the binaries in the OS, making sure only things related to the bug are changed. We can view partitioning as a "transformation". Unless the transformation is absolutely no-op, it will change something. If we care the consistency in binaries, we either consistently use partition or consistently not use partition. But doesn't "consistently not use partition" mean "don't use the optimization you are working on"? Isn't there someone to get the same output no matter how it is partitioned? The compiler used to generate a single object file from the merged IR, now it will generate multiple of them, one for each partition. I have not studied the MC interface, but why does each partition need to generate a separate object file? Why can't the first partition done create an object file, and as other partitions finish, they just append to that object file? We could append the object files as an alternative. However, how do we know the /path/to/ld from the existing interface ABIs? How do we know the flags feed to the ld (more often than not, "-r" alone is enough, but some linkers may need more). In my rudimentary implement, I hack by hardcoding to /usr/bin/ld. I think adding object one by one back to the linker is better as the linker already have enough information. I think you missed my point, or you are really thinking from the multi-process point of view. In LLVM there is an MCWriter used to produce object files. Your model is that if there are three partitions, then there will be three MCWriter objects, each producing an object file. What I am saying is to have only one MCWriter object and have all three partitions stream their content out through the one MCWriter, producing one object file. [Xiaofei] if you only target parallelizing post LTO passes, there is no need to partition, parallelizing function passes is enough and could achieve better performance (no need to link partitions together since it share AsmPrinter & MCWriter) -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Wed Jul 17 19:09:54 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 17 Jul 2013 19:09:54 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> Message-ID: <51E74E72.5000902@gmail.com> Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order for the pre- and post- IPO scalar optimizations. This is wish-list in our mind: pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of all loop xforms... post-IPO: get rid of inlining, or maybe we still need it, only perform the inling to to callee which now become tiny. enable the loop xforms. The SCC pass manager seems to be important inling, no matter how the inling looks like in the future, I think the passmanager is still useful for scalar opt. It enable us to achieve cheap inter-procedural opt hands down in the sense that we can optimize callee, analyze it, and feedback the detailed whatever info back to caller (say info like "the callee already return constant 5", the "callee return value in 5-10", and such info is difficult to obtain and IPO stage, as it can not afford to take such closer look. I think it is too early to discuss the pre-IPO and post-IPO thing, let us focus on what Andy is proposing. On 7/17/13 6:04 PM, Sean Silva wrote: > There seems to be a lot of interest recently in LTO. How do you see > the situation of splitting the IR passes between per-TU processing and > multi-TU ("link time") processing? > > -- Sean Silva > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From brianherman at gmail.com Wed Jul 17 19:46:31 2013 From: brianherman at gmail.com (Brian Herman) Date: Wed, 17 Jul 2013 21:46:31 -0500 Subject: [LLVMdev] Try Emscripten in the browser Message-ID: Here is a little toy I made in a couple days http://kompile.org/ -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Wed Jul 17 19:49:19 2013 From: atrick at apple.com (Andrew Trick) Date: Wed, 17 Jul 2013 19:49:19 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> Message-ID: <0F593DFC-754F-4ADA-A2C2-EA8E48F3E67D@apple.com> On Jul 16, 2013, at 9:38 PM, Andrew Trick wrote: > IR Canonicalization Pipeline: > > Function Passes { > SimplifyCFG > SROA-1 > EarlyCSE > } > Call-Graph SCC Passes { > Inline > Function Passes { > EarlyCSE > SimplifyCFG > InstCombine > Early Loop Opts { > LoopSimplify > Rotate (when obvious) > Full-Unroll (when obvious) > } > SROA-2 > InstCombine > GVN ... I should explain: SROA-1 and SROA-2 are not necessarily different versions of SROA (though they could be in the future), I just wanted to be clear that it is run twice. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From silvas at purdue.edu Wed Jul 17 19:50:58 2013 From: silvas at purdue.edu (Sean Silva) Date: Wed, 17 Jul 2013 19:50:58 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: <20130718010609.GA17472@pcc.me.uk> References: <20130718010609.GA17472@pcc.me.uk> Message-ID: On Wed, Jul 17, 2013 at 6:06 PM, Peter Collingbourne wrote: > Hi, > > I would like to propose that we introduce a mechanism in IR to allow > arbitrary data to be stashed before a function body. The purpose of > this would be to allow additional data about a function to be looked > up via a function pointer. Two use cases come to mind: > > 1) We'd like to be able to use UBSan to check that the type of the > function pointer of an indirect function call matches the type of > the function being called. This can't really be done efficiently > without storing type information near the function. > How efficient does it have to be? Have some alternatives already proven to be "too slow"? (e.g. a binary search into a sorted table) > > 2) Allowing GHC's tables-next-to-code ABI [1] to be implemented. > In general, I imagine this feature could be useful for the > implementation of languages which require runtime metadata for > each function. > > The proposal is that an IR function definition acquires a constant > operand which contains the data to be emitted immediately before > the function body (known as the prefix data). To access the data > for a given function, a program may bitcast the function pointer to > a pointer to the constant's type. This implies that the IR symbol > points to the start of the prefix data. > > To maintain the semantics of ordinary function calls, the prefix data > must have a particular format. Specifically, it must begin with a > sequence of bytes which decode to a sequence of machine instructions, > valid for the module's target, which transfer control to the point > immediately succeeding the prefix data, without performing any other > visible action. This allows the inliner and other passes to reason > about the semantics of the function definition without needing to > reason about the prefix data. Obviously this makes the format of the > prefix data highly target dependent. > I'm not sure that something this target dependent is the right choice. Your example below suggests that the frontend would then need to know magic to put "raw" in the instruction stream. Have you considered having the feature expose just the intent "store this data attached to the function, to be accessed very quickly", and then have an intrinsic ("llvm.getfuncdata.i{8,16,32,64}"?) which extracts the data in a target-dependent way? Forcing clients to embed deep target-specific-machine-code knowledge in their frontends seems like a step in the wrong direction for LLVM. > > This requirement could be relaxed when combined with my earlier symbol > offset proposal [2] as applied to functions. However, this is outside > the scope of the current proposal. > > Example: > > %0 = type <{ i32, i8* }> > > define void @f() prefix %0 <{ i32 1413876459, i8* bitcast ({ i8*, i8* }* > @_ZTIFvvE to i8*) }> { > ret void > } > > This is an example of something that UBSan might generate on an > x86_64 machine. It consists of a signature of 4 bytes followed by a > pointer to the RTTI data for the type 'void ()'. The signature when > laid out as a little endian 32-bit integer decodes to the instruction > 'jmp .+0x0c' (which jumps to the instruction immediately succeeding > the 12-byte prefix) followed by the bytes 'F' and 'T' which identify > the prefix as a UBSan function type prefix. > Do you know whether OoO CPU's will still attempt to decode the "garbage" in the instruction stream, even if there is a jump over it? (IIRC they will decode ahead of the PC and hiccup (but not fault) on garbage). Maybe it would be better to steganographically encode the value inside the instruction stream? On x86 you could use 48b8 which only has 2 bytes overhead for an i64 (putting a move like that, which moves into a caller-saved register on entry, would effectively be a noop). This is some pretty gnarly target-dependent stuff which seems like it would best be hidden in the backend (e.g. architectures that have "constant island"-like passes might want to stash the data in there instead). -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Wed Jul 17 19:59:18 2013 From: atrick at apple.com (Andrew Trick) Date: Wed, 17 Jul 2013 19:59:18 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51E74E72.5000902@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51E74E72.5000902@gmail.com> Message-ID: <571FF076-0627-48BF-A847-EDFE2A6E3425@apple.com> On Jul 17, 2013, at 7:09 PM, Shuxin Yang wrote: > Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order > for the pre- and post- IPO scalar optimizations. > > This is wish-list in our mind: > > pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of > all loop xforms... > > post-IPO: get rid of inlining, or maybe we still need it, only perform the inling to to callee which now become tiny. > enable the loop xforms. > > The SCC pass manager seems to be important inling, no matter how the inling looks like in the future, > I think the passmanager is still useful for scalar opt. It enable us to achieve cheap inter-procedural > opt hands down in the sense that we can optimize callee, analyze it, and feedback the detailed whatever > info back to caller (say info like "the callee already return constant 5", the "callee return value in 5-10", > and such info is difficult to obtain and IPO stage, as it can not afford to take such closer look. > > I think it is too early to discuss the pre-IPO and post-IPO thing, let us focus on what Andy is proposing. Right. We don’t need to debate the specifics of the LTO pipeline yet, particularly in this thread. The important thing is that the function passes will run in the same order for LTO, so we should probably factor PassManagerBuilder to make that easy. The difference with LTO is that some of the canonical IR passes will end up running at least twice, in both pre and post IPO. The lowering IR passes will only run in post IPO. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Milind.Chabbi at rice.edu Wed Jul 17 21:01:17 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Wed, 17 Jul 2013 21:01:17 -0700 Subject: [LLVMdev] About LLVM switch instruction Message-ID: I am performing a transformation that requires changing the targets of a basic block ending with a switch instruction. In particular, I need to delete the edge that goes to the "default" basic block. But, LLVM switch instruction always wants a default target basic block for a switch instruction. It is not clear how to accomplish this, since I don't have a replacement default target block. I could potentially fake that edge to be one of the other case label targets, but that is an ugly hack and I don't want to do that. I would appreciate if you can suggest better alternatives. -Milind From chanakya.sun at gmail.com Wed Jul 17 07:53:29 2013 From: chanakya.sun at gmail.com (Venkata Suneel Kota) Date: Wed, 17 Jul 2013 20:23:29 +0530 Subject: [LLVMdev] regarding compiling clang for different platform Message-ID: Hi, I am new to LLVM I want to use llvm and clang on Android, I have downloaded android toolchain and did the configure for llvm using the following commad ./configure --build=arm-linux-androideabi --host=arm-linux-androideabi --target=arm-linux-androideabi --with-float=hard --with-fpu=neon --enable-targets=arm --enable-optimized --enable-assertions and was getting the error "checking build system type... Invalid configuration `arm-linux-androideabi': system `androideabi' not recognized configure: error: /bin/bash autoconf/config.sub arm-linux-androideabi failed" i modified the command available in the following link http://llvm.org/releases/3.3/docs/HowToBuildOnARM.html ./configure --build=armv7l-unknown-linux-gnueabihf \ --host=armv7l-unknown-linux-gnueabihf \ --target=armv7l-unknown-linux-gnueabihf --with-cpu=cortex-a9 \ --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon \ --enable-targets=arm --enable-optimized --enable-assertions can any one help me on how to do it... Thanks in Advance.. Have A Nice Day..... Thanks & Regards chanakya -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.lacey at apple.com Wed Jul 17 21:29:18 2013 From: mark.lacey at apple.com (Mark Lacey) Date: Wed, 17 Jul 2013 21:29:18 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: Message-ID: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> On Jul 17, 2013, at 9:01 PM, Milind Chabbi wrote: > I am performing a transformation that requires changing the targets of > a basic block ending with a switch instruction. > In particular, I need to delete the edge that goes to the "default" > basic block. > But, LLVM switch instruction always wants a default target basic block > for a switch instruction. > It is not clear how to accomplish this, since I don't have a > replacement default target block. > I could potentially fake that edge to be one of the other case label > targets, but that is an ugly hack and I don't want to do that. > I would appreciate if you can suggest better alternatives. Hi Milind, If you make the "default" branch to a block that has an UnreachableInst as a terminator, the SimplifyCFG pass will remove one of the switch cases and replace the block that the default branches to with the block that this removed case branches to. This sounds a lot like the "ugly hack" that you would like to avoid. Would it be a reasonable solution for what you are trying to accomplish? Mark From Milind.Chabbi at rice.edu Wed Jul 17 22:09:37 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Wed, 17 Jul 2013 22:09:37 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: Hi Mark, This will workaround the problem of "default" branch restriction on the switch instruction. The trouble with this technique is that it will trump later optimization phases such as constant propagation. When a block was part of a case, because of the knowledge of the case value, the block was a candidate for better optimization. However, when we move the body of the case into the default, the knowledge of the case value is lost and the body is less optimizable. -Milind On Wed, Jul 17, 2013 at 9:29 PM, Mark Lacey wrote: > On Jul 17, 2013, at 9:01 PM, Milind Chabbi wrote: >> I am performing a transformation that requires changing the targets of >> a basic block ending with a switch instruction. >> In particular, I need to delete the edge that goes to the "default" >> basic block. >> But, LLVM switch instruction always wants a default target basic block >> for a switch instruction. >> It is not clear how to accomplish this, since I don't have a >> replacement default target block. >> I could potentially fake that edge to be one of the other case label >> targets, but that is an ugly hack and I don't want to do that. >> I would appreciate if you can suggest better alternatives. > > Hi Milind, > > If you make the "default" branch to a block that has an UnreachableInst as a terminator, the SimplifyCFG pass will remove one of the switch cases and replace the block that the default branches to with the block that this removed case branches to. This sounds a lot like the "ugly hack" that you would like to avoid. Would it be a reasonable solution for what you are trying to accomplish? > > Mark > > From d.albuschat at gmail.com Wed Jul 17 22:21:37 2013 From: d.albuschat at gmail.com (Daniel Albuschat) Date: Thu, 18 Jul 2013 07:21:37 +0200 Subject: [LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang In-Reply-To: References: <20841B25-DA88-44B5-AC65-9202C6FDA0E9@apple.com> <04D075BA-2997-4718-90F7-C76773937BB1@apple.com> <013C5CDA-D3BB-4E05-9B30-9963949348FA@apple.com> Message-ID: 2013/7/16 JF Bastien > I've debugged enough debug+release mixing issues > shared-libraries to feel that pain. > ACK. It's not only debug+release, but very often an outdated object file for me. Because the build system didn't pick up code changes in some headers and did not re-compile compilation units that were affected. (Hello Visual Studio, it's you that I am looking at.) Would have saved me numerous hours. Greetings, Daniel Albuschat -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.lacey at apple.com Wed Jul 17 22:28:22 2013 From: mark.lacey at apple.com (Mark Lacey) Date: Wed, 17 Jul 2013 22:28:22 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: On Jul 17, 2013, at 10:09 PM, Milind Chabbi wrote: > Hi Mark, > > This will workaround the problem of "default" branch restriction on > the switch instruction. The trouble with this technique is that it > will trump later optimization phases such as constant propagation. > When a block was part of a case, because of the knowledge of the case > value, the block was a candidate for better optimization. However, > when we move the body of the case into the default, the knowledge of > the case value is lost and the body is less optimizable. Yes, it is not ideal for a variety of reasons, and I am actually looking at improving how we deal with unreachable switch defaults now because of that. Could you provide any additional detail about the transforms you are doing and what you are trying to accomplish? Mark From etherzhhb at gmail.com Wed Jul 17 23:11:40 2013 From: etherzhhb at gmail.com (Hongbin Zheng) Date: Thu, 18 Jul 2013 14:11:40 +0800 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: Hi Milind, Maybe you could annotate the default case value as metadata to the swith instruction. Thanks Hongbin On Thu, Jul 18, 2013 at 1:09 PM, Milind Chabbi wrote: > Hi Mark, > > This will workaround the problem of "default" branch restriction on > the switch instruction. The trouble with this technique is that it > will trump later optimization phases such as constant propagation. > When a block was part of a case, because of the knowledge of the case > value, the block was a candidate for better optimization. However, > when we move the body of the case into the default, the knowledge of > the case value is lost and the body is less optimizable. > > -Milind > > > On Wed, Jul 17, 2013 at 9:29 PM, Mark Lacey wrote: > > On Jul 17, 2013, at 9:01 PM, Milind Chabbi > wrote: > >> I am performing a transformation that requires changing the targets of > >> a basic block ending with a switch instruction. > >> In particular, I need to delete the edge that goes to the "default" > >> basic block. > >> But, LLVM switch instruction always wants a default target basic block > >> for a switch instruction. > >> It is not clear how to accomplish this, since I don't have a > >> replacement default target block. > >> I could potentially fake that edge to be one of the other case label > >> targets, but that is an ugly hack and I don't want to do that. > >> I would appreciate if you can suggest better alternatives. > > > > Hi Milind, > > > > If you make the "default" branch to a block that has an UnreachableInst > as a terminator, the SimplifyCFG pass will remove one of the switch cases > and replace the block that the default branches to with the block that this > removed case branches to. This sounds a lot like the "ugly hack" that you > would like to avoid. Would it be a reasonable solution for what you are > trying to accomplish? > > > > Mark > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Milind.Chabbi at rice.edu Wed Jul 17 23:17:01 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Wed, 17 Jul 2013 23:17:01 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: Mark, I am basically trying to do a specialized form of unreachable block elimination. I bumped into this issue when I was thinking of eliminating unreachable cases (including default). In the following code, assertion propagation should easily infer that the default is unreachable. But, llvm at -O3 leaves the code in default intact. Your technique of placing "unreachable" instruction in the default case makes a difference though. -------------------------------------------------- int foo(unsigned int i , unsigned int j){ if(i >= 0 && i < 3) { switch(i){ case 0: i = i + j; case 1: i = i + j; case 2: i = i + j; break; default : i = i * j; } } return i; } -------------------------------------------------- On Wed, Jul 17, 2013 at 10:28 PM, Mark Lacey wrote: > > On Jul 17, 2013, at 10:09 PM, Milind Chabbi wrote: >> Hi Mark, >> >> This will workaround the problem of "default" branch restriction on >> the switch instruction. The trouble with this technique is that it >> will trump later optimization phases such as constant propagation. >> When a block was part of a case, because of the knowledge of the case >> value, the block was a candidate for better optimization. However, >> when we move the body of the case into the default, the knowledge of >> the case value is lost and the body is less optimizable. > > Yes, it is not ideal for a variety of reasons, and I am actually looking at improving how we deal with unreachable switch defaults now because of that. > > Could you provide any additional detail about the transforms you are doing and what you are trying to accomplish? > > Mark > > From peter at uformia.com Wed Jul 17 23:23:41 2013 From: peter at uformia.com (Peter Newman) Date: Thu, 18 Jul 2013 16:23:41 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> Message-ID: <51E789ED.2080509@uformia.com> Unfortunately, this doesn't appear to be the bug I'm hitting. I applied the fix to my source and it didn't make a difference. Also further testing found me getting the same behavior with other SIMD instructions. The common factor is in each case, ECX is set to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . Additionally, turning the optimization level passed to createJIT down appears to avoid it, so I'm now leaning towards a bug in one of the optimization passes. I'm going to dig through the passes controlled by that parameter and see if I can narrow down which optimization is causing it. Peter N On 17/07/2013 1:58 PM, Solomon Boulos wrote: > As someone off list just told me, perhaps my new bug is the same issue: > > http://llvm.org/bugs/show_bug.cgi?id=16640 > > Do you happen to be using FastISel? > > Solomon > > On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: > >> Hello all, >> >> I'm currently in the process of debugging a crash occurring in our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is attempting to perform access unaligned memory with a SSE2 instruction. However this only happens under certain conditions that seem (but may not be) related to the stacks state on calling the function. >> >> Our program acts as a front-end, using the LLVM C++ API to generate a JIT generated function. This function is primarily mathematical, so we use the Vector types to take advantage of SIMD instructions (as well as a few SSE2 intrinsics). >> >> This worked in LLVM 2.8 but started failing in 3.2 and has continued to fail in 3.3. It fails with no optimizations applied to the LLVM Function/Module. It crashes with what is reported as a memory access error (accessing 0xffffffff), however it's suggested that this is how the SSE fault raising mechanism appears. >> >> The generated instruction varies, but it seems to often be similar to (I don't have it in front of me, sorry): >> movapd xmm0, xmm[ecx+0x???????] >> Where the xmm register changes, and the second parameter is a memory access. >> ECX is always set to 0x7ffffff - however I don't know if this is part of the SSE error reporting process or is part of the situation causing the error. >> >> I haven't worked out exactly what code path etc is causing this crash. I'm hoping that someone can tell me if there were any changed requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first discovered the crash when using a feature that uses them), however I have attempted using setAlignment on the GlobalVariables without any change. >> >> -- >> Peter N >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From Milind.Chabbi at rice.edu Wed Jul 17 23:30:55 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Wed, 17 Jul 2013 23:30:55 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: Hongbin Can you elaborate more on your suggestion? I am not sure I fully understand what you suggested. -Milind On Wed, Jul 17, 2013 at 11:11 PM, Hongbin Zheng wrote: > Hi Milind, > > Maybe you could annotate the default case value as metadata to the swith > instruction. > > Thanks > Hongbin > > > On Thu, Jul 18, 2013 at 1:09 PM, Milind Chabbi > wrote: >> >> Hi Mark, >> >> This will workaround the problem of "default" branch restriction on >> the switch instruction. The trouble with this technique is that it >> will trump later optimization phases such as constant propagation. >> When a block was part of a case, because of the knowledge of the case >> value, the block was a candidate for better optimization. However, >> when we move the body of the case into the default, the knowledge of >> the case value is lost and the body is less optimizable. >> >> -Milind >> >> >> On Wed, Jul 17, 2013 at 9:29 PM, Mark Lacey wrote: >> > On Jul 17, 2013, at 9:01 PM, Milind Chabbi >> > wrote: >> >> I am performing a transformation that requires changing the targets of >> >> a basic block ending with a switch instruction. >> >> In particular, I need to delete the edge that goes to the "default" >> >> basic block. >> >> But, LLVM switch instruction always wants a default target basic block >> >> for a switch instruction. >> >> It is not clear how to accomplish this, since I don't have a >> >> replacement default target block. >> >> I could potentially fake that edge to be one of the other case label >> >> targets, but that is an ugly hack and I don't want to do that. >> >> I would appreciate if you can suggest better alternatives. >> > >> > Hi Milind, >> > >> > If you make the "default" branch to a block that has an UnreachableInst >> > as a terminator, the SimplifyCFG pass will remove one of the switch cases >> > and replace the block that the default branches to with the block that this >> > removed case branches to. This sounds a lot like the "ugly hack" that you >> > would like to avoid. Would it be a reasonable solution for what you are >> > trying to accomplish? >> > >> > Mark >> > >> > >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From craig.topper at gmail.com Wed Jul 17 23:37:01 2013 From: craig.topper at gmail.com (Craig Topper) Date: Wed, 17 Jul 2013 23:37:01 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E789ED.2080509@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> Message-ID: Are you able to send any IR for others to reproduce this issue? On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: > Unfortunately, this doesn't appear to be the bug I'm hitting. I applied > the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other SIMD > instructions. The common factor is in each case, ECX is set to 0x7fffffff, > and it's an operation using xmm ptr ecx+offset . > > Additionally, turning the optimization level passed to createJIT down > appears to avoid it, so I'm now leaning towards a bug in one of the > optimization passes. > > I'm going to dig through the passes controlled by that parameter and see > if I can narrow down which optimization is causing it. > > Peter N > > > On 17/07/2013 1:58 PM, Solomon Boulos wrote: > >> As someone off list just told me, perhaps my new bug is the same issue: >> >> http://llvm.org/bugs/show_bug.**cgi?id=16640 >> >> Do you happen to be using FastISel? >> >> Solomon >> >> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >> >> Hello all, >>> >>> I'm currently in the process of debugging a crash occurring in our >>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>> attempting to perform access unaligned memory with a SSE2 instruction. >>> However this only happens under certain conditions that seem (but may not >>> be) related to the stacks state on calling the function. >>> >>> Our program acts as a front-end, using the LLVM C++ API to generate a >>> JIT generated function. This function is primarily mathematical, so we use >>> the Vector types to take advantage of SIMD instructions (as well as a few >>> SSE2 intrinsics). >>> >>> This worked in LLVM 2.8 but started failing in 3.2 and has continued to >>> fail in 3.3. It fails with no optimizations applied to the LLVM >>> Function/Module. It crashes with what is reported as a memory access error >>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>> fault raising mechanism appears. >>> >>> The generated instruction varies, but it seems to often be similar to (I >>> don't have it in front of me, sorry): >>> movapd xmm0, xmm[ecx+0x???????] >>> Where the xmm register changes, and the second parameter is a memory >>> access. >>> ECX is always set to 0x7ffffff - however I don't know if this is part of >>> the SSE error reporting process or is part of the situation causing the >>> error. >>> >>> I haven't worked out exactly what code path etc is causing this crash. >>> I'm hoping that someone can tell me if there were any changed requirements >>> for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >>> 3.1). I currently suspect the use of GlobalVariable (we first discovered >>> the crash when using a feature that uses them), however I have attempted >>> using setAlignment on the GlobalVariables without any change. >>> >>> -- >>> Peter N >>> ______________________________**_________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev >>> >> > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From etherzhhb at gmail.com Wed Jul 17 23:58:21 2013 From: etherzhhb at gmail.com (Hongbin Zheng) Date: Thu, 18 Jul 2013 14:58:21 +0800 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: Hi Milind, My suggestion just for your concern that if you eliminate the default block, a block associated with a case value will become the default block of the swhich instruction, since a switch instruction always requires a default block. But when a block associated with a case value become the default block, the associated case value is lost and may confuse the later optimizations such as constant propagation. To prevent such information lost when you eliminate the default block and make a block associated with a case value will become the default block, you can attach a metadata[1] to the switch instruction to provide the case value of the default block. In order to take the advantage of the attached metadata for the default case of the switch instruction you also need to modify the later optimization accordingly. Thanks Hongbin [1]http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html On Thu, Jul 18, 2013 at 2:30 PM, Milind Chabbi wrote: > Hongbin > > Can you elaborate more on your suggestion? I am not sure I fully > understand what you suggested. > > -Milind > > On Wed, Jul 17, 2013 at 11:11 PM, Hongbin Zheng > wrote: > > Hi Milind, > > > > Maybe you could annotate the default case value as metadata to the swith > > instruction. > > > > Thanks > > Hongbin > > > > > > On Thu, Jul 18, 2013 at 1:09 PM, Milind Chabbi > > wrote: > >> > >> Hi Mark, > >> > >> This will workaround the problem of "default" branch restriction on > >> the switch instruction. The trouble with this technique is that it > >> will trump later optimization phases such as constant propagation. > >> When a block was part of a case, because of the knowledge of the case > >> value, the block was a candidate for better optimization. However, > >> when we move the body of the case into the default, the knowledge of > >> the case value is lost and the body is less optimizable. > >> > >> -Milind > >> > >> > >> On Wed, Jul 17, 2013 at 9:29 PM, Mark Lacey > wrote: > >> > On Jul 17, 2013, at 9:01 PM, Milind Chabbi > >> > wrote: > >> >> I am performing a transformation that requires changing the targets > of > >> >> a basic block ending with a switch instruction. > >> >> In particular, I need to delete the edge that goes to the "default" > >> >> basic block. > >> >> But, LLVM switch instruction always wants a default target basic > block > >> >> for a switch instruction. > >> >> It is not clear how to accomplish this, since I don't have a > >> >> replacement default target block. > >> >> I could potentially fake that edge to be one of the other case label > >> >> targets, but that is an ugly hack and I don't want to do that. > >> >> I would appreciate if you can suggest better alternatives. > >> > > >> > Hi Milind, > >> > > >> > If you make the "default" branch to a block that has an > UnreachableInst > >> > as a terminator, the SimplifyCFG pass will remove one of the switch > cases > >> > and replace the block that the default branches to with the block > that this > >> > removed case branches to. This sounds a lot like the "ugly hack" that > you > >> > would like to avoid. Would it be a reasonable solution for what you > are > >> > trying to accomplish? > >> > > >> > Mark > >> > > >> > > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.streit at googlemail.com Thu Jul 18 00:15:53 2013 From: kevin.streit at googlemail.com (Kevin Streit) Date: Thu, 18 Jul 2013 09:15:53 +0200 Subject: [LLVMdev] eclipse and gdb In-Reply-To: <51E69748.7010003@stefant.org> References: <51E528ED.5060307@mips.com> <8CC341AF-45FF-4C84-9E0A-D5143A6D20CD@apple.com> <51E5EAFD.3020203@mips.com> <51E5ECDA.5000107@mips.com> <51E5F130.1080400@mips.com> <33D99FA7-D537-4F06-B5D9-29C4B98FDCDE@apple.com> <51E69748.7010003@stefant.org> Message-ID: <51E79629.8070806@googlemail.com> Hi, From time to time I am using eclipse as well to work on llvm. CDT and in particular the indexer are quite demanding when it comes to memory consumption and I experienced that the default maximum heap size is not enough for that and will eventually lead to freezes or similar. After increasing the maximum heap size in the eclipse.ini [1] (comes with your eclipse distribution; in Mac it is contained in the Application bundle: Eclipse.app/Contents/MacOS), say to 3G, I experienced no problems and did not have to exclude any files from indexing. Cheers, Kevin [1] http://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to_Eclipse%3F -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4273 bytes Desc: S/MIME Cryptographic Signature URL: From renato.golin at linaro.org Thu Jul 18 00:37:46 2013 From: renato.golin at linaro.org (Renato Golin) Date: Thu, 18 Jul 2013 08:37:46 +0100 Subject: [LLVMdev] regarding compiling clang for different platform In-Reply-To: References: Message-ID: Hi Venkata, Some folks are working on Clang+LLVM to work on Android on a separate project called LLVMLinux: http://llvm.linuxfoundation.org/index.php/Main_Page There you can find a lot of information on how to build Android with Clang, and find the modified trees for the kernel and userland, as well as LLVM to compile and boot on your device. Feel free to join the mailing list and ask the same question, I'm sure they'll be able to help. cheers, --renato On 17 July 2013 15:53, Venkata Suneel Kota wrote: > Hi, > I am new to LLVM > I want to use llvm and clang on Android, I have downloaded android > toolchain and did the configure for llvm using the following commad > ./configure --build=arm-linux-androideabi --host=arm-linux-androideabi > --target=arm-linux-androideabi --with-float=hard --with-fpu=neon > --enable-targets=arm --enable-optimized --enable-assertions > > and was getting the error > "checking build system type... Invalid configuration > `arm-linux-androideabi': system `androideabi' not recognized > configure: error: /bin/bash autoconf/config.sub arm-linux-androideabi > failed" > > i modified the command available in the following link > http://llvm.org/releases/3.3/docs/HowToBuildOnARM.html > > > ./configure --build=armv7l-unknown-linux-gnueabihf \ > --host=armv7l-unknown-linux-gnueabihf \ > --target=armv7l-unknown-linux-gnueabihf --with-cpu=cortex-a9 \ > --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon \ > --enable-targets=arm --enable-optimized --enable-assertions > > > can any one help me on how to do it... > Thanks in Advance.. > Have A Nice Day..... > > Thanks & Regards > chanakya > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Thu Jul 18 00:50:33 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 18 Jul 2013 08:50:33 +0100 Subject: [LLVMdev] regarding compiling clang for different platform In-Reply-To: References: Message-ID: Hi Venkata, > ./configure --build=arm-linux-androideabi --host=arm-linux-androideabi > --target=arm-linux-androideabi --with-float=hard --with-fpu=neon > --enable-targets=arm --enable-optimized --enable-assertions Renato's suggestion is a good one, but I suspect the immediate problem here is that this configure line is for building *on* an ARM board. You're probably executing this on your x86 desktop, which means the --build option wouldn't work. Also, unless you actually want to run the resulting Clang on Android (rather than just use it to compile for Android), the --host isn't needed either. But then you'll hit the fun of trying to get Clang to find Android's headers and libraries, which is where Renato's suggestion becomes even better. We're not really Android experts here (mostly). Cheers. Tim. From renato.golin at linaro.org Thu Jul 18 01:14:57 2013 From: renato.golin at linaro.org (Renato Golin) Date: Thu, 18 Jul 2013 09:14:57 +0100 Subject: [LLVMdev] regarding compiling clang for different platform In-Reply-To: References: Message-ID: On 18 July 2013 08:50, Tim Northover wrote: > But then you'll hit the fun of trying to get Clang to find Android's > headers and libraries > Some Linaro folks are also trying to build Android with Clang and they have some wrappers to make clang work transparently (as a cross-compiler from Intel to ARM). They all hangout on the LLVMLinux mailing list or IRC channel (OFTC, #llvmlinux). There are also ABI issues, since androideabi is not gnueabi which is not aeabi and Clang/LLVM knows very little about the difference (but the Kernel breaks because of enum sizes and other little things). I never built it myself, but I know that it's not trivial because the Android build system is something of a marvel of the modern world that has GCC hard-coded all over the place. Using the Clang wrapper should work with a standard Clang binary (if it has the ARM back-end), so the way you build Clang shouldn't matter much. Good luck! --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihail.popa at gmail.com Thu Jul 18 07:36:49 2013 From: mihail.popa at gmail.com (Mihail Popa) Date: Thu, 18 Jul 2013 15:36:49 +0100 Subject: [LLVMdev] Trap instruction for ARMv7 and Thumb Message-ID: HI group. I was wondering why the "trap" instruction is implemented in the ARM backend as an undefined opcode. For ARM mode, it uses 0xe7ffdefe, for Thumb 0xdefe. Why not use the BKT #imm instruction? Does anybody remember the reason behind this? Thanks, Mihai -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrea_DiBiagio at sn.scee.net Thu Jul 18 08:23:28 2013 From: Andrea_DiBiagio at sn.scee.net (Andrea_DiBiagio at sn.scee.net) Date: Thu, 18 Jul 2013 16:23:28 +0100 Subject: [LLVMdev] [RFC] add Function Attribute to disable optimization In-Reply-To: References: <51BF7516.4070800@mxc.ca> Message-ID: So.. I have investigated more on how a new function attribute to disable optimization on a per-function basis could be implemented. At the current state, with the lack of specific support from the pass managers I found two big problems when trying to implement a prototype implementation of the new attribute. Here are the problems found: 1) It is not safe to disable some transform passes in the backend. It looks like there are some unwritten dependences between passes and disrupting the sequence of passes to run may result in unexpected crashes and/or assertion failures; 2) The fact that pass managers are not currently designed to support per-function optimization makes it difficult to find a reasonable way to implement this new feature. About point 2. the Idea that came in my mind consisted in making passes aware of the 'noopt' attribute. In my experiment: - I added a virtual method called 'mustAlwaysRun' in class Pass that 'returns true if it is not safe to disable this pass'. If a pass does not override the default implementation of that method, then by default it will always return true (i.e. the pass "must always run" pass even when attribute 'noopt' is specified). - I then redefined in override that method on all the optimization passes that could have been safely turned off when attribute noopt was present. In my experiment, I specifically didn't disable Module Passes; - Then I modified the 'doInitialize()' 'run*()' and 'doFinalize' methods in Pass Manger to check for both the presence of attribute noopt AND the value returned by method 'mustAlwaysRun' called on the current pass instance. That experiment seemed to "work" on a few tests and benchmarks. However: a) 'noopt' wouldn't really imply no optimization, since not all codegen optimization passes can be safely disabled. As a result, the assembly produced for noopt functions had few differences with respect to the assembly generated for the same functions at -O0; b) I don't particularly like the idea of making passes "aware" of the 'noopt' attribute. However, I don't know if there is a reasonable way to implement the noopt attribute without having to re-design how pass managers work. c) Because of a. and b., I am concerned that a change like the one described above won't be accepted. If so however, I would be really interested in the feedback from the community. Maybe there are better ways to implement 'noopt' which I don't know/didn't think about? As I said, I am not very happy with the proposed solution and any feedback would be really appreciated at this point. By the way, here is how I thought the 'noopt' proposal could have been contributed in terms of patches: [LLVM IR][Patch 1] ================ This patch extends the IR adding a new attribute called 'noopt'. Below, is a sequence of steps which describes how to implement this patch. 1) Add a definition for attribute 'noopt' in File llvm/IR/Attribute.h; 2) Teach how attribute 'noopt' should be encoded and also how to print it out as a string value (File lib/IR/Attributes.cpp); 2b) Add a new enum value for the new attribute in enum LLVMAttribute (File "include/llvm-c/Core.h"); 3) The new attribute is a function attribute; Teach the verifier pass that 'noopt' is a function attribute; Add checks in method VerifyAttributeTypes() (File lib/IR/Verifier.cpp): * NoOpt is a function-only attribute; * Assert if NoOpt is used in the same context as alwaysinline; * Assert if NoOpt is used in the same context as OptimizeForSize (needed?); * Assert if NoOpt is used in the same context as MinSize (needed?). 4) Add a LLVM test in test/Feature to verify that we correctly disassemble the new function attribute (see for example file cold.ll); 5) Teach the AsmParser how to parse the new attribute: * Add a new token for the new attribute noopt; * Add rules to parse the new token; 6) Add a description of the new attribute in "docs/LangRef.rst"; [LLVM][Opt][Patch 2] ================== This patch implements the required changes to passes and pass managers. Below, is a sequence of steps which describes how to implement this patch. 1) Make the new inliner aware of the new flag. * In lib/Transforms/IPO/Inliner.cpp: ** do not inline the callee if it is not always_inline and the caller is marked 'noopt'. * No other changes are required since 'noopt' already implies 'noinline'. 2) Tell the pass manager which transform passes can be safely disabled with 'noopt'. [CLANG][Patch 3] =============== This patch teaches clang how to parse and generate code for functions that are marked with attribute 'noopt'. 1) Lex * Add a new token for the 'noopt' keyword. * That keyword is for a function attribute. 2) Sema * Add a rule to handle the case where noopt is passed as function attribute. * check that the attribute does not take extra arguments. * check that the attribute is associated to a function declaration. * Add the attribute to the IR Set of Attributes. 3) CodeGen * noopt implies 'noinline. * noopt always wins over always_inline * noopt does not win over 'naked': naked functions only contain asm statements. This attribute is only valid for ARM, AVX, MCORE, RL78, RX and SPU to indicate that the specified function does not need prologue/epilogue sequence generated by the compiler. (NOTE: this constraint can be removed). 4) Add clang tests: * in test/Sema: ** Verify that noopt only applies to functions. (-cc1 -fsyntax-only -verify) * in test/CodeGen: ** Check that noopt implies noinline ** Check combinations of noopt and noinline and always_inline Andrea Di Biagio SN Systems - Sony Computer Entertainment Group. Andrea DiBiagio/SN R&D/BS/UK/SCEE wrote on 25/06/2013 15:20:12: > From: Andrea DiBiagio/SN R&D/BS/UK/SCEE > To: Nick Lewycky > Cc: cfe-dev at cs.uiuc.edu, llvmdev at cs.uiuc.edu > Date: 25/06/2013 15:20 > Subject: Re: [LLVMdev] [RFC] add Function Attribute to disable optimization > > Hi Nick, > > > From: Nick Lewycky > > > This proposal is to create a new function-level attribute which would tell > > > the compiler to not to perform any optimizing transformations on the > > > specified function. > > > > What about module passes? Do you want to disable all module passes in a > > TU which contains a single one of these? I'll be unhappy if we need to > > litter checks throughout the module passes that determine whether a > > given instruction is inside an unoptimizable function or not. Saying > > that module passes are exempt from checking the 'noopt' attribute is > > fine to me, but then somebody needs to know how to module passes (and > > users may be surprised to discover that adding such an annotation to one > > function will cause seemingly-unrelated functions to become less optimized). > Right, module passes are a difficult case. > I understand your point. I think ignoring the `noopt' attribute (or > whatever we want to call it) may be the best approach in this case: > it avoid the problems you describe but should still be sufficient > for the purposes we care about. I am currently studying the module > passes in more details to be certain about this. > Thanks for the useful feedback, > Andrea Di Biagio > SN Systems - Sony Computer Entertainment Group > > > > The use-case is to be able to selectively disable optimizations when > > > debugging a small number of functions in a compilation unit to provide an > > > -O0-like quality of debugging in cases where compiling the whole unit at > > > anything less than full optimization would make the program run too > > > slowly. A useful secondary-effect of this feature would be to allow users > > > to temporarily work-around optimization bugs in LLVM without having to > > > reduce the optimization level for the whole compilation unit, however we > > > do not consider this the most important use-case. > > > > > > Our suggestion for the name for this attribute is "optnone" which seems to > > > be in keeping with the existing "optsize" attribute, although it could > > > equally be called "noopt" or something else entirely. It would be exposed > > > to Clang users through __attribute__((optnone)) or [[optnone]]. > > > > > > I would like to discuss this proposal with the rest of the community to > > > share opinions and have feedback on this. > > > > > > =================================================== > > > Interactions with the existing function attributes: > > > > > > LLVM allows to decorate functions with 'noinline', alwaysinline' and > > > 'inlinehint'. We think that it makes sense for 'optnone' to implicitly > > > imply 'noinline' (at least from a user's point of view) and therefore > > > 'optnone' should be considered incompatible with 'alwaysinline' and > > > 'inlinehint'. > > > > > > Example: > > > __attribute__((optnone, always_inline)) > > > void foo() { ... } > > > > > > In this case we could make 'optnone' override 'alwaysinline'. The effect > > > would be that 'alwaysinline' wouldn't appear in the IR if 'optnone' is > > > specified. > > > > > > Under the assumption that 'optnone' implies 'noinline', other things that > > > should be taken into account are: > > > 1) functions marked as 'optnone' should never be considered as potential > > > candidates for inlining; > > > 2) the inliner shouldn't try to inline a function if the call site is in a > > > 'optnone' function. > > > > > > Point 1 can be easily achieved by simply pushing attribute 'noinline' on > > > the list of function attributes if 'optnone' is used. > > > point 2 however would probably require to teach the Inliner about > > > 'optnone' and how to deal with it. > > > > > > As in the case of 'alwaysinline' and 'inlinehint', I think 'optnone' > > > should also override 'optsize'/'minsize'. > > > > > > Last (but not least), implementing 'optnone' would still require changes > > > in how optimizations are run on functions. This last part is probably the > > > hardest part since the current optimizer does not allow the level of > > > flexibility required by 'optnone'. It seems it would either require some > > > modifications to the Pass Manager or we would have to make individual > > > passes aware of the attribute. Neither of these solutions seem > > > particularly attractive to me, so I'm open to any suggestions! > > > > > > Thanks, > > > Andrea Di Biagio > > > SN Systems - Sony Computer Entertainment Group > > > > > > > > > ********************************************************************** > > > This email and any files transmitted with it are confidential and intended > > > solely for the use of the individual or entity to whom they are addressed. > > > If you have received this email in error please notify postmaster at scee.net > > > This footnote also confirms that this email message has been checked for > > > all known viruses. > > > Sony Computer Entertainment Europe Limited > > > Registered Office: 10 Great Marlborough Street, London W1F 7LP, United > > > Kingdom > > > Registered in England: 3277793 > > > ********************************************************************** > > > > > > P Please consider the environment before printing this e-mail > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > ********************************************************************** > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please > notify postmaster at scee.net > This footnote also confirms that this email message has been checked > for all known viruses. > Sony Computer Entertainment Europe Limited > Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom > Registered in England: 3277793 > ********************************************************************** > > P Please consider the environment before printing this e-mail ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses. Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793 ********************************************************************** P Please consider the environment before printing this e-mail From pmiscml at gmail.com Thu Jul 18 08:34:25 2013 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 18 Jul 2013 18:34:25 +0300 Subject: [LLVMdev] Some experiences using LLVM C Backend Message-ID: <20130718183425.1850354f@x34f> Hello, I'm interested in LLVM as an opportunity to support C++ programming for legacy MCUs (8051, PIC1x, etc.). Recently, I tried to use C Backend as means to achieve this. As C Backend was removed in recent LLVM versions, I started with LLVM 3.0 which was last version to include it. For starters, I played with MSP430 target, which is supported by LLVM, so allows roundtrip experiments (comparing results of C++ -> MSP430 assembly vs C++ -> C -> MSP430 assembly compilation). One of the first things I saw with C Backend output was main() being declared as "unsigned" which then caused error when fed to Clang (main() should be int). Next issue was handling of inline asm. It looked that the corresponding code in C backend wasn't tested and had few thinkos. Don't get wrong - I'm glad it was written and it was easy to fix. There were few other small issues like missing includes (stdint.h). With that, I was able to achieve perfect roundtrip with trivial by functionality, but still using few layers of C++ magic (templates and inline functions) blink example (this one specifically: https://github.com/pfalcon/PeripheralTemplateLibrary/blob/master/examples/blink.cpp , the repository also has Makefiles for LLVM). My patches to LLVM 3.0 are available at https://github.com/pfalcon/llvm/commits/release_30-cbackend-fixes . My next step was to try to integrate them into "cbe_revival" patchset as started by Roel Jordans and available as https://github.com/glycerine/llvm/tree/cbe_revival . I found that this branch doesn't build OOB (one header changed its location), and then I was greeted by: "Inline assambler not supported" assertion (note typo in the word "assembler"). So, please consider reinstating inline asm support, because otherwise, at least for the usecase discussed, it's more productive to use LLVM 3.0. Besides blink.cpp, I so far quickly looked into few other (still pretty simple) examples - some of them achieve perfect roundtrip, some differ in arithmetic sequences - that definitely has something to do with C char -> int promotion, at this time I cannot say if C backend code was equivalent; some differ in basic block ordering (i.e different BB order when flattening CFG into instruction stream), some actually differ in CFG (one issue I spotted with tail duplication of inline asm statements - hope to post a patch soon). I hope to do more detailed and formal roundtrip comparisons on the larger code corpus and report results later (if someone can suggest a utility to perform fuzzy graph isomorphism checks, that would be appreciated). -- Best regards, Paul mailto:pmiscml at gmail.com From lee2041412 at gmail.com Thu Jul 18 08:36:58 2013 From: lee2041412 at gmail.com (Yun-Wei Lee) Date: Thu, 18 Jul 2013 10:36:58 -0500 Subject: [LLVMdev] Request to review patch for bug #14792 Message-ID: http://llvm.org/bugs/show_bug.cgi?id=14792 Problem: In the i386 ABI Page 3-10, it said that the stack is aligned. However, the two example code show that does not handle the alignment correctly when using variadic function. For example, if the size of the first argument is 17, the overflow_arg_area in va_list will be set to "address of first argument + 16" instead of "address of first argument + 24" after calling va_start. In addition, #6636 showed the same problem because in AMD64, arguments is passed by register at first, then pass by memory when run out of register (AMD64 ABI 3.5.7 rule 10). Why this problem happened? When calling va_start to set va_list, overflow_arg_area is not set correctly. To set the overflow_arg_area correctly, we need to get the FrameIndex correctly. Now, here comes the problem, llvm doesn't handle it correctly. It accounts for StackSize to compute the FrameIndex, and if the StackSize is not aligned, it will compute the wrong FrameIndex. As a result overflow_arg_area will not be set correctly. My Solution: 1. Record the Align if it is located in Memory. 2. If it is variadic function and needs to set FrameIndex, adjust the stacksize. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brianherman at gmail.com Thu Jul 18 09:35:17 2013 From: brianherman at gmail.com (Brian Herman) Date: Thu, 18 Jul 2013 11:35:17 -0500 Subject: [LLVMdev] Compile Error SVN Message-ID: llvm[2]: Compiling FileSystemStatCache.cpp for Release+Asserts build FileSystemStatCache.cpp: In static member function 'static bool clang::FileSystemStatCache::get(const char*, stat&, bool, int*, clang::FileSystemStatCache*)': FileSystemStatCache.cpp:63: error: 'openFileForRead' is not a member of 'llvm::sys::fs' make[2]: *** [/root/llvm-3.3.src/tools/clang/lib/Basic/Release+Asserts/FileSystemStatCache.o] Error 1 make[2]: Leaving directory `/root/llvm-3.3.src/tools/clang/lib/Basic' make[1]: *** [Basic/.makeall] Error 2 make[1]: Leaving directory `/root/llvm-3.3.src/tools/clang/lib' make: *** [all] Error 1 Did I do something wrong? -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Thu Jul 18 09:07:14 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Thu, 18 Jul 2013 18:07:14 +0200 Subject: [LLVMdev] LLVM 3.3 JIT code speed Message-ID: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> Hi, Our DSL LLVM IR emitted code (optimized with -O3 kind of IR ==> IR passes) runs slower when executed with the LLVM 3.3 JIT, compared to what we had with LLVM 3.1. What could be the reason? I tried to play with TargetOptions without any success… Here is the kind of code we use to allocate the JIT: EngineBuilder builder(fResult->fModule); builder.setOptLevel(CodeGenOpt::Aggressive); builder.setEngineKind(EngineKind::JIT); builder.setUseMCJIT(true); builder.setCodeModel(CodeModel::JITDefault); builder.setMCPU(llvm::sys::getHostCPUName()); TargetOptions targetOptions; targetOptions.NoFramePointerElim = true; targetOptions.LessPreciseFPMADOption = true; targetOptions.UnsafeFPMath = true; targetOptions.NoInfsFPMath = true; targetOptions.NoNaNsFPMath = true; targetOptions.GuaranteedTailCallOpt = true; builder.setTargetOptions(targetOptions); TargetMachine* tm = builder.selectTarget(); fJIT = builder.create(tm); if (!fJIT) { return false; } …. Any idea? Thanks. Stéphane Letz From nicholas at mxc.ca Thu Jul 18 09:38:29 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Thu, 18 Jul 2013 09:38:29 -0700 Subject: [LLVMdev] [RFC] add Function Attribute to disable optimization In-Reply-To: References: <51BF7516.4070800@mxc.ca> Message-ID: <51E81A05.9040206@mxc.ca> Andrea_DiBiagio at sn.scee.net wrote: > So.. > I have investigated more on how a new function attribute to disable > optimization on a per-function basis could be implemented. > At the current state, with the lack of specific support from the pass > managers I found two big problems when trying to implement a prototype > implementation of the new attribute. > > Here are the problems found: > 1) It is not safe to disable some transform passes in the backend. Hold on. By 'backend' do you mean LLVM IR FunctionPasses, or LLVM CodeGen MachineFunctionPasses? The former you should be able to turn off, reorder, etc., at will. The latter is a fixed pipeline, you either choose the -O0 pipeline or the -O2 pipeline. If you find that turning off IR-level passes is triggering assertions, please start filing bugs. Nick > It looks like there are some unwritten dependences between passes and > disrupting the sequence of passes to run may result in unexpected crashes > and/or assertion failures; > 2) The fact that pass managers are not currently designed to support > per-function optimization makes it difficult to find a reasonable way to > implement this new feature. > > About point 2. the Idea that came in my mind consisted in making passes > aware of the 'noopt' attribute. > In my experiment: > - I added a virtual method called 'mustAlwaysRun' in class Pass that > 'returns true if it is not safe to disable this pass'. > If a pass does not override the default implementation of that method, > then by default it will always return true (i.e. the pass "must > always run" pass even when attribute 'noopt' is specified). > - I then redefined in override that method on all the optimization passes > that could have been safely turned off when attribute noopt was present. > In my experiment, I specifically didn't disable Module Passes; > - Then I modified the 'doInitialize()' 'run*()' and 'doFinalize' methods > in Pass Manger to check for both the presence of attribute noopt AND the > value returned by method 'mustAlwaysRun' called on the current pass > instance. > > That experiment seemed to "work" on a few tests and benchmarks. > However: > a) 'noopt' wouldn't really imply no optimization, since not all codegen > optimization passes can be safely disabled. As a result, the assembly > produced for noopt functions had few differences with respect to the > assembly generated for the same functions at -O0; > b) I don't particularly like the idea of making passes "aware" of the > 'noopt' attribute. However, I don't know if there is a reasonable way to > implement the noopt attribute without having to re-design how pass > managers work. > c) Because of a. and b., I am concerned that a change like the one > described above won't be accepted. If so however, I would be really > interested in the feedback from the community. Maybe there are better ways > to implement 'noopt' which I don't know/didn't think about? > > As I said, I am not very happy with the proposed solution and any feedback > would be really appreciated at this point. > > By the way, here is how I thought the 'noopt' proposal could have been > contributed in terms of patches: > > [LLVM IR][Patch 1] > ================ > This patch extends the IR adding a new attribute called 'noopt'. > Below, is a sequence of steps which describes how to implement this patch. > > 1) Add a definition for attribute 'noopt' in File llvm/IR/Attribute.h; > 2) Teach how attribute 'noopt' should be encoded and also how to print it > out > as a string value (File lib/IR/Attributes.cpp); > 2b) Add a new enum value for the new attribute in enum LLVMAttribute > (File "include/llvm-c/Core.h"); > 3) The new attribute is a function attribute; > Teach the verifier pass that 'noopt' is a function attribute; > Add checks in method VerifyAttributeTypes() (File lib/IR/Verifier.cpp): > * NoOpt is a function-only attribute; > * Assert if NoOpt is used in the same context as alwaysinline; > * Assert if NoOpt is used in the same context as OptimizeForSize > (needed?); > * Assert if NoOpt is used in the same context as MinSize (needed?). > 4) Add a LLVM test in test/Feature to verify that we correctly disassemble > the new function attribute (see for example file cold.ll); > 5) Teach the AsmParser how to parse the new attribute: > * Add a new token for the new attribute noopt; > * Add rules to parse the new token; > 6) Add a description of the new attribute in "docs/LangRef.rst"; > > [LLVM][Opt][Patch 2] > ================== > This patch implements the required changes to passes and pass managers. > Below, is a sequence of steps which describes how to implement this patch. > > 1) Make the new inliner aware of the new flag. > * In lib/Transforms/IPO/Inliner.cpp: > ** do not inline the callee if it is not always_inline and the caller > is marked 'noopt'. > * No other changes are required since 'noopt' already implies 'noinline'. > 2) Tell the pass manager which transform passes can be safely disabled > with 'noopt'. > > [CLANG][Patch 3] > =============== > This patch teaches clang how to parse and generate code for functions that > are marked with attribute 'noopt'. > > 1) Lex > * Add a new token for the 'noopt' keyword. > * That keyword is for a function attribute. > 2) Sema > * Add a rule to handle the case where noopt is passed as function > attribute. > * check that the attribute does not take extra arguments. > * check that the attribute is associated to a function declaration. > * Add the attribute to the IR Set of Attributes. > 3) CodeGen > * noopt implies 'noinline. > * noopt always wins over always_inline > * noopt does not win over 'naked': naked functions only contain asm > statements. This attribute is only valid for ARM, AVX, MCORE, RL78, RX > and > SPU to indicate that the specified function does not need > prologue/epilogue > sequence generated by the compiler. (NOTE: this constraint can be > removed). > 4) Add clang tests: > * in test/Sema: > ** Verify that noopt only applies to functions. (-cc1 -fsyntax-only > -verify) > * in test/CodeGen: > ** Check that noopt implies noinline > ** Check combinations of noopt and noinline and always_inline > > > Andrea Di Biagio > SN Systems - Sony Computer Entertainment Group. > > Andrea DiBiagio/SN R&D/BS/UK/SCEE wrote on 25/06/2013 15:20:12: > >> From: Andrea DiBiagio/SN R&D/BS/UK/SCEE >> To: Nick Lewycky >> Cc: cfe-dev at cs.uiuc.edu, llvmdev at cs.uiuc.edu >> Date: 25/06/2013 15:20 >> Subject: Re: [LLVMdev] [RFC] add Function Attribute to disable > optimization >> >> Hi Nick, >> >>> From: Nick Lewycky >>>> This proposal is to create a new function-level attribute which > would tell >>>> the compiler to not to perform any optimizing transformations on the >>>> specified function. >>> >>> What about module passes? Do you want to disable all module passes in > a >>> TU which contains a single one of these? I'll be unhappy if we need to > >>> litter checks throughout the module passes that determine whether a >>> given instruction is inside an unoptimizable function or not. Saying >>> that module passes are exempt from checking the 'noopt' attribute is >>> fine to me, but then somebody needs to know how to module passes (and >>> users may be surprised to discover that adding such an annotation to > one >>> function will cause seemingly-unrelated functions to become less > optimized). > >> Right, module passes are a difficult case. >> I understand your point. I think ignoring the `noopt' attribute (or >> whatever we want to call it) may be the best approach in this case: >> it avoid the problems you describe but should still be sufficient >> for the purposes we care about. I am currently studying the module >> passes in more details to be certain about this. > >> Thanks for the useful feedback, >> Andrea Di Biagio >> SN Systems - Sony Computer Entertainment Group >> >>>> The use-case is to be able to selectively disable optimizations when >>>> debugging a small number of functions in a compilation unit to > provide an >>>> -O0-like quality of debugging in cases where compiling the whole > unit at >>>> anything less than full optimization would make the program run too >>>> slowly. A useful secondary-effect of this feature would be to allow > users >>>> to temporarily work-around optimization bugs in LLVM without having > to >>>> reduce the optimization level for the whole compilation unit, > however we >>>> do not consider this the most important use-case. >>>> >>>> Our suggestion for the name for this attribute is "optnone" which > seems to >>>> be in keeping with the existing "optsize" attribute, although it > could >>>> equally be called "noopt" or something else entirely. It would be > exposed >>>> to Clang users through __attribute__((optnone)) or [[optnone]]. >>>> >>>> I would like to discuss this proposal with the rest of the community > to >>>> share opinions and have feedback on this. >>>> >>>> =================================================== >>>> Interactions with the existing function attributes: >>>> >>>> LLVM allows to decorate functions with 'noinline', alwaysinline' and >>>> 'inlinehint'. We think that it makes sense for 'optnone' to > implicitly >>>> imply 'noinline' (at least from a user's point of view) and > therefore >>>> 'optnone' should be considered incompatible with 'alwaysinline' and >>>> 'inlinehint'. >>>> >>>> Example: >>>> __attribute__((optnone, always_inline)) >>>> void foo() { ... } >>>> >>>> In this case we could make 'optnone' override 'alwaysinline'. The > effect >>>> would be that 'alwaysinline' wouldn't appear in the IR if 'optnone' > is >>>> specified. >>>> >>>> Under the assumption that 'optnone' implies 'noinline', other things > that >>>> should be taken into account are: >>>> 1) functions marked as 'optnone' should never be considered as > potential >>>> candidates for inlining; >>>> 2) the inliner shouldn't try to inline a function if the call site > is in a >>>> 'optnone' function. >>>> >>>> Point 1 can be easily achieved by simply pushing attribute > 'noinline' on >>>> the list of function attributes if 'optnone' is used. >>>> point 2 however would probably require to teach the Inliner about >>>> 'optnone' and how to deal with it. >>>> >>>> As in the case of 'alwaysinline' and 'inlinehint', I think 'optnone' >>>> should also override 'optsize'/'minsize'. >>>> >>>> Last (but not least), implementing 'optnone' would still require > changes >>>> in how optimizations are run on functions. This last part is > probably the >>>> hardest part since the current optimizer does not allow the level of >>>> flexibility required by 'optnone'. It seems it would either require > some >>>> modifications to the Pass Manager or we would have to make > individual >>>> passes aware of the attribute. Neither of these solutions seem >>>> particularly attractive to me, so I'm open to any suggestions! >>>> >>>> Thanks, >>>> Andrea Di Biagio >>>> SN Systems - Sony Computer Entertainment Group >>>> >>>> >>>> > ********************************************************************** >>>> This email and any files transmitted with it are confidential and > intended >>>> solely for the use of the individual or entity to whom they are > addressed. >>>> If you have received this email in error please notify > postmaster at scee.net >>>> This footnote also confirms that this email message has been checked > for >>>> all known viruses. >>>> Sony Computer Entertainment Europe Limited >>>> Registered Office: 10 Great Marlborough Street, London W1F 7LP, > United >>>> Kingdom >>>> Registered in England: 3277793 >>>> > ********************************************************************** >>>> >>>> P Please consider the environment before printing this e-mail >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >> >> >> ********************************************************************** >> This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they >> are addressed. If you have received this email in error please >> notify postmaster at scee.net >> This footnote also confirms that this email message has been checked >> for all known viruses. >> Sony Computer Entertainment Europe Limited >> Registered Office: 10 Great Marlborough Street, London W1F 7LP, United > Kingdom >> Registered in England: 3277793 >> ********************************************************************** >> >> P Please consider the environment before printing this e-mail > > ********************************************************************** > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you have received this email in error please notify postmaster at scee.net > This footnote also confirms that this email message has been checked for > all known viruses. > Sony Computer Entertainment Europe Limited > Registered Office: 10 Great Marlborough Street, London W1F 7LP, United > Kingdom > Registered in England: 3277793 > ********************************************************************** > > P Please consider the environment before printing this e-mail > From brianherman at gmail.com Thu Jul 18 09:44:54 2013 From: brianherman at gmail.com (Brian Herman) Date: Thu, 18 Jul 2013 11:44:54 -0500 Subject: [LLVMdev] Compile Error SVN In-Reply-To: References: Message-ID: error_code llvm::sys::fs::openFileForRead (const Twine & *Name*, int & *ResultFD* ) Referenced by llvm::MemoryBuffer::getFile() . This is the corresponding llvm page on openFileForRead. Did it change in the svn repo? On Thu, Jul 18, 2013 at 11:35 AM, Brian Herman wrote: > llvm[2]: Compiling FileSystemStatCache.cpp for Release+Asserts build > FileSystemStatCache.cpp: In static member function 'static bool > clang::FileSystemStatCache::get(const char*, stat&, bool, int*, > clang::FileSystemStatCache*)': > FileSystemStatCache.cpp:63: error: 'openFileForRead' is not a member of > 'llvm::sys::fs' > make[2]: *** > [/root/llvm-3.3.src/tools/clang/lib/Basic/Release+Asserts/FileSystemStatCache.o] > Error 1 > make[2]: Leaving directory `/root/llvm-3.3.src/tools/clang/lib/Basic' > make[1]: *** [Basic/.makeall] Error 2 > make[1]: Leaving directory `/root/llvm-3.3.src/tools/clang/lib' > make: *** [all] Error 1 > > Did I do something wrong? > > -- > > > Thanks, > Brian Herman > college.nfshost.com > > > > > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jevinsweval at gmail.com Thu Jul 18 09:45:32 2013 From: jevinsweval at gmail.com (Jevin Sweval) Date: Thu, 18 Jul 2013 12:45:32 -0400 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: <20130718010609.GA17472@pcc.me.uk> References: <20130718010609.GA17472@pcc.me.uk> Message-ID: On Wed, Jul 17, 2013 at 9:06 PM, Peter Collingbourne wrote: > > To maintain the semantics of ordinary function calls, the prefix data > must have a particular format. Specifically, it must begin with a > sequence of bytes which decode to a sequence of machine instructions, > valid for the module's target, which transfer control to the point > immediately succeeding the prefix data, without performing any other > visible action. This allows the inliner and other passes to reason > about the semantics of the function definition without needing to > reason about the prefix data. Obviously this makes the format of the > prefix data highly target dependent. What if the prefix data was stored before the start of the function code? The function's symbol will point to the code just as before, eliminating the need to have instructions that skip the prefix data. It would look something like: | Prefix Data ... (variable length) | Prefix Data Length (fixed length [32 bits?]) | Function code .... | ^ function symbol points here (function code) I hope the simple ASCII art makes it through my mail client. To access the data, you do prefix_data = function_ptr - sizeof(prefix_length) - prefix_length Cheers, Jevin From jfb at google.com Thu Jul 18 09:47:24 2013 From: jfb at google.com (JF Bastien) Date: Thu, 18 Jul 2013 09:47:24 -0700 Subject: [LLVMdev] Trap instruction for ARMv7 and Thumb In-Reply-To: References: Message-ID: The OS delivers different signals on each of these instructions. I had a discussion about this with Jim Grosbach when checking in the NaCl TRAP, see the commit archive. On Thu, Jul 18, 2013 at 7:36 AM, Mihail Popa wrote: > HI group. > > I was wondering why the "trap" instruction is implemented in the ARM > backend as an undefined opcode. For ARM mode, it uses 0xe7ffdefe, for > Thumb 0xdefe. > > Why not use the BKT #imm instruction? > Does anybody remember the reason behind this? > > Thanks, > Mihai > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrea_DiBiagio at sn.scee.net Thu Jul 18 09:48:54 2013 From: Andrea_DiBiagio at sn.scee.net (Andrea_DiBiagio at sn.scee.net) Date: Thu, 18 Jul 2013 17:48:54 +0100 Subject: [LLVMdev] [RFC] add Function Attribute to disable optimization In-Reply-To: <51E81A05.9040206@mxc.ca> References: <51BF7516.4070800@mxc.ca> <51E81A05.9040206@mxc.ca> Message-ID: > From: Nick Lewycky > Andrea_DiBiagio at sn.scee.net wrote: > > So.. > > I have investigated more on how a new function attribute to disable > > optimization on a per-function basis could be implemented. > > At the current state, with the lack of specific support from the pass > > managers I found two big problems when trying to implement a prototype > > implementation of the new attribute. > > > > Here are the problems found: > > 1) It is not safe to disable some transform passes in the backend. > > Hold on. By 'backend' do you mean LLVM IR FunctionPasses, or LLVM > CodeGen MachineFunctionPasses? The former you should be able to turn > off, reorder, etc., at will. The latter is a fixed pipeline, you either > choose the -O0 pipeline or the -O2 pipeline. Sorry, I meant CodeGen MachineFunctionPasses. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses. Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793 ********************************************************************** P Please consider the environment before printing this e-mail From eli.friedman at gmail.com Thu Jul 18 10:07:23 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 18 Jul 2013 10:07:23 -0700 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> Message-ID: On Thu, Jul 18, 2013 at 9:07 AM, Stéphane Letz wrote: > Hi, > > Our DSL LLVM IR emitted code (optimized with -O3 kind of IR ==> IR passes) runs slower when executed with the LLVM 3.3 JIT, compared to what we had with LLVM 3.1. What could be the reason? > > I tried to play with TargetOptions without any success… > > Here is the kind of code we use to allocate the JIT: > > EngineBuilder builder(fResult->fModule); > builder.setOptLevel(CodeGenOpt::Aggressive); > builder.setEngineKind(EngineKind::JIT); > builder.setUseMCJIT(true); > builder.setCodeModel(CodeModel::JITDefault); > builder.setMCPU(llvm::sys::getHostCPUName()); > > TargetOptions targetOptions; > targetOptions.NoFramePointerElim = true; > targetOptions.LessPreciseFPMADOption = true; > targetOptions.UnsafeFPMath = true; > targetOptions.NoInfsFPMath = true; > targetOptions.NoNaNsFPMath = true; > targetOptions.GuaranteedTailCallOpt = true; > > builder.setTargetOptions(targetOptions); > > TargetMachine* tm = builder.selectTarget(); > > fJIT = builder.create(tm); > if (!fJIT) { > return false; > } > …. > > Any idea? It's hard to say much without seeing the specific IR and the code generated from that IR. -Eli From lidl at pix.net Thu Jul 18 11:04:38 2013 From: lidl at pix.net (Kurt Lidl) Date: Thu, 18 Jul 2013 14:04:38 -0400 Subject: [LLVMdev] clang searching for many linux directories that do not exist on FreeBSD host Message-ID: <51E82E36.7020505@pix.net> Greetings - I'm a user of clang (3.3), as it is the system compiler for my installation of FreeBSD. (In FreeBSD 10, it will be the default compiler, but that's not my point.) My system identifies itself as: FreeBSD 9.2-PRERELEASE #0: Tue Jul 16 13:00:08 EDT 2013 lidl at nine0:/usr/obj/usr/src/sys/GENERIC Recently, in preparation for the upcoming 9.2 release, they imported the llvm 3.3 tree. That works fine for me. I did notice (while looking at an unrelated problem), that clan looks around for a bunch of linux directories every time it is started. And those directories are never going to be found, at least not on a FreeBSD system. It was suggested that I take the issue here, rather than attempting to fix it with a locally maintained FreeBSD patch. I've included my trivial test program, and the ktrace output. I trimmed output to be the legitimate accesses to shared libraries, etc that clang must make - leaving most of the extraneous accesses. It was easy to trace these patterns back to the file: tools/clang/lib/Driver/ToolChains.cpp So my question is this: Is there any easy modification to make that will allow clang to skip doing all this work for no gain? It seems silly to me, when its being used as the system compiler, to have it call stat() a little over two hundred times, each time the compiler is started up. (There's also a very Mac-looking /System/Library/... stat() at the end too...) Thanks for any help. -Kurt lidl at nine0-309: cat hello.c #include #include int main(int argc, char *argv[]) { printf("Hello world!\n"); return 0; } lidl at nine0-310: ktrace -i clang -Wall hello.c lidl at nine0-311: kdump | egrep -e NAMI -e /usr/lib | awk '{print $4}' [...] "/usr/lib64" "/usr/lib" "/usr/lib/gcc/x86_64-linux-gnu" "/usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu" "/usr/lib/x86_64-linux-gnu" "/usr/lib/gcc/x86_64-unknown-linux-gnu" "/usr/lib/x86_64-unknown-linux-gnu/gcc/x86_64-unknown-linux-gnu" "/usr/lib/x86_64-unknown-linux-gnu" "/usr/lib/gcc/x86_64-pc-linux-gnu" "/usr/lib/x86_64-pc-linux-gnu/gcc/x86_64-pc-linux-gnu" "/usr/lib/x86_64-pc-linux-gnu" "/usr/lib/gcc/x86_64-redhat-linux6E" "/usr/lib/x86_64-redhat-linux6E/gcc/x86_64-redhat-linux6E" "/usr/lib/x86_64-redhat-linux6E" "/usr/lib/gcc/x86_64-redhat-linux" "/usr/lib/x86_64-redhat-linux/gcc/x86_64-redhat-linux" "/usr/lib/x86_64-redhat-linux" "/usr/lib/gcc/x86_64-suse-linux" "/usr/lib/x86_64-suse-linux/gcc/x86_64-suse-linux" "/usr/lib/x86_64-suse-linux" "/usr/lib/gcc/x86_64-manbo-linux-gnu" "/usr/lib/x86_64-manbo-linux-gnu/gcc/x86_64-manbo-linux-gnu" "/usr/lib/x86_64-manbo-linux-gnu" "/usr/lib/gcc/x86_64-linux-gnu" "/usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu" "/usr/lib/x86_64-linux-gnu" "/usr/lib/gcc/x86_64-slackware-linux" "/usr/lib/x86_64-slackware-linux/gcc/x86_64-slackware-linux" "/usr/lib/x86_64-slackware-linux" "/usr/lib/gcc/x86_64-unknown-freebsd9.2" "/usr/lib/x86_64-unknown-freebsd9.2/gcc/x86_64-unknown-freebsd9.2" "/usr/lib/x86_64-unknown-freebsd9.2" "/usr/lib32" "/usr/lib32/gcc/i686-linux-gnu" "/usr/lib32/i686-linux-gnu/gcc/i686-linux-gnu" "/usr/lib32/i686-linux-gnu" "/usr/lib32/gcc/i686-pc-linux-gnu" "/usr/lib32/i686-pc-linux-gnu/gcc/i686-pc-linux-gnu" "/usr/lib32/i686-pc-linux-gnu" "/usr/lib32/gcc/i486-linux-gnu" "/usr/lib32/i486-linux-gnu/gcc/i486-linux-gnu" "/usr/lib32/i486-linux-gnu" "/usr/lib32/gcc/i386-linux-gnu" "/usr/lib32/i386-linux-gnu/gcc/i386-linux-gnu" "/usr/lib32/i386-linux-gnu" "/usr/lib32/gcc/i386-redhat-linux6E" "/usr/lib32/i386-redhat-linux6E/gcc/i386-redhat-linux6E" "/usr/lib32/i386-redhat-linux6E" "/usr/lib32/gcc/i686-redhat-linux" "/usr/lib32/i686-redhat-linux/gcc/i686-redhat-linux" "/usr/lib32/i686-redhat-linux" "/usr/lib32/gcc/i586-redhat-linux" "/usr/lib32/i586-redhat-linux/gcc/i586-redhat-linux" "/usr/lib32/i586-redhat-linux" "/usr/lib32/gcc/i386-redhat-linux" "/usr/lib32/i386-redhat-linux/gcc/i386-redhat-linux" "/usr/lib32/i386-redhat-linux" "/usr/lib32/gcc/i586-suse-linux" "/usr/lib32/i586-suse-linux/gcc/i586-suse-linux" "/usr/lib32/i586-suse-linux" "/usr/lib32/gcc/i486-slackware-linux" "/usr/lib32/i486-slackware-linux/gcc/i486-slackware-linux" "/usr/lib32/i486-slackware-linux" "/usr/lib32/gcc/i686-montavista-linux" "/usr/lib32/i686-montavista-linux/gcc/i686-montavista-linux" "/usr/lib32/i686-montavista-linux" "/usr/lib32/gcc/i386-unknown-freebsd9.2" "/usr/lib32/i386-unknown-freebsd9.2/gcc/i386-unknown-freebsd9.2" "/usr/lib32/i386-unknown-freebsd9.2" "/usr/lib" "/usr/lib/gcc/i686-linux-gnu" "/usr/lib/i686-linux-gnu/gcc/i686-linux-gnu" "/usr/lib/i686-linux-gnu" "/usr/lib/gcc/i686-pc-linux-gnu" "/usr/lib/i686-pc-linux-gnu/gcc/i686-pc-linux-gnu" "/usr/lib/i686-pc-linux-gnu" "/usr/lib/gcc/i486-linux-gnu" "/usr/lib/i486-linux-gnu/gcc/i486-linux-gnu" "/usr/lib/i486-linux-gnu" "/usr/lib/gcc/i386-linux-gnu" "/usr/lib/i386-linux-gnu/gcc/i386-linux-gnu" "/usr/lib/i386-linux-gnu" "/usr/lib/gcc/i386-redhat-linux6E" "/usr/lib/i386-redhat-linux6E/gcc/i386-redhat-linux6E" "/usr/lib/i386-redhat-linux6E" "/usr/lib/gcc/i686-redhat-linux" "/usr/lib/i686-redhat-linux/gcc/i686-redhat-linux" "/usr/lib/i686-redhat-linux" "/usr/lib/gcc/i586-redhat-linux" "/usr/lib/i586-redhat-linux/gcc/i586-redhat-linux" "/usr/lib/i586-redhat-linux" "/usr/lib/gcc/i386-redhat-linux" "/usr/lib/i386-redhat-linux/gcc/i386-redhat-linux" "/usr/lib/i386-redhat-linux" "/usr/lib/gcc/i586-suse-linux" "/usr/lib/i586-suse-linux/gcc/i586-suse-linux" "/usr/lib/i586-suse-linux" "/usr/lib/gcc/i486-slackware-linux" "/usr/lib/i486-slackware-linux/gcc/i486-slackware-linux" "/usr/lib/i486-slackware-linux" "/usr/lib/gcc/i686-montavista-linux" "/usr/lib/i686-montavista-linux/gcc/i686-montavista-linux" "/usr/lib/i686-montavista-linux" "/usr/lib/gcc/i386-unknown-freebsd9.2" "/usr/lib/i386-unknown-freebsd9.2/gcc/i386-unknown-freebsd9.2" "/usr/lib/i386-unknown-freebsd9.2" "/usr/bin/.." "/usr/bin/../lib64" "/usr/bin/../lib" "/usr/bin/../lib/gcc/x86_64-linux-gnu" "/usr/bin/../lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu" "/usr/bin/../lib/x86_64-linux-gnu" "/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu" "/usr/bin/../lib/x86_64-unknown-linux-gnu/gcc/x86_64-unknown-linux-gnu" "/usr/bin/../lib/x86_64-unknown-linux-gnu" "/usr/bin/../lib/gcc/x86_64-pc-linux-gnu" "/usr/bin/../lib/x86_64-pc-linux-gnu/gcc/x86_64-pc-linux-gnu" "/usr/bin/../lib/x86_64-pc-linux-gnu" "/usr/bin/../lib/gcc/x86_64-redhat-linux6E" "/usr/bin/../lib/x86_64-redhat-linux6E/gcc/x86_64-redhat-linux6E" "/usr/bin/../lib/x86_64-redhat-linux6E" "/usr/bin/../lib/gcc/x86_64-redhat-linux" "/usr/bin/../lib/x86_64-redhat-linux/gcc/x86_64-redhat-linux" "/usr/bin/../lib/x86_64-redhat-linux" "/usr/bin/../lib/gcc/x86_64-suse-linux" "/usr/bin/../lib/x86_64-suse-linux/gcc/x86_64-suse-linux" "/usr/bin/../lib/x86_64-suse-linux" "/usr/bin/../lib/gcc/x86_64-manbo-linux-gnu" "/usr/bin/../lib/x86_64-manbo-linux-gnu/gcc/x86_64-manbo-linux-gnu" "/usr/bin/../lib/x86_64-manbo-linux-gnu" "/usr/bin/../lib/gcc/x86_64-linux-gnu" "/usr/bin/../lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu" "/usr/bin/../lib/x86_64-linux-gnu" "/usr/bin/../lib/gcc/x86_64-slackware-linux" "/usr/bin/../lib/x86_64-slackware-linux/gcc/x86_64-slackware-linux" "/usr/bin/../lib/x86_64-slackware-linux" "/usr/bin/../lib/gcc/x86_64-unknown-freebsd9.2" "/usr/bin/../lib/x86_64-unknown-freebsd9.2/gcc/x86_64-unknown-freebsd9.2" "/usr/bin/../lib/x86_64-unknown-freebsd9.2" "/usr/bin/../lib32" "/usr/bin/../lib32/gcc/i686-linux-gnu" "/usr/bin/../lib32/i686-linux-gnu/gcc/i686-linux-gnu" "/usr/bin/../lib32/i686-linux-gnu" "/usr/bin/../lib32/gcc/i686-pc-linux-gnu" "/usr/bin/../lib32/i686-pc-linux-gnu/gcc/i686-pc-linux-gnu" "/usr/bin/../lib32/i686-pc-linux-gnu" "/usr/bin/../lib32/gcc/i486-linux-gnu" "/usr/bin/../lib32/i486-linux-gnu/gcc/i486-linux-gnu" "/usr/bin/../lib32/i486-linux-gnu" "/usr/bin/../lib32/gcc/i386-linux-gnu" "/usr/bin/../lib32/i386-linux-gnu/gcc/i386-linux-gnu" "/usr/bin/../lib32/i386-linux-gnu" "/usr/bin/../lib32/gcc/i386-redhat-linux6E" "/usr/bin/../lib32/i386-redhat-linux6E/gcc/i386-redhat-linux6E" "/usr/bin/../lib32/i386-redhat-linux6E" "/usr/bin/../lib32/gcc/i686-redhat-linux" "/usr/bin/../lib32/i686-redhat-linux/gcc/i686-redhat-linux" "/usr/bin/../lib32/i686-redhat-linux" "/usr/bin/../lib32/gcc/i586-redhat-linux" "/usr/bin/../lib32/i586-redhat-linux/gcc/i586-redhat-linux" "/usr/bin/../lib32/i586-redhat-linux" "/usr/bin/../lib32/gcc/i386-redhat-linux" "/usr/bin/../lib32/i386-redhat-linux/gcc/i386-redhat-linux" "/usr/bin/../lib32/i386-redhat-linux" "/usr/bin/../lib32/gcc/i586-suse-linux" "/usr/bin/../lib32/i586-suse-linux/gcc/i586-suse-linux" "/usr/bin/../lib32/i586-suse-linux" "/usr/bin/../lib32/gcc/i486-slackware-linux" "/usr/bin/../lib32/i486-slackware-linux/gcc/i486-slackware-linux" "/usr/bin/../lib32/i486-slackware-linux" "/usr/bin/../lib32/gcc/i686-montavista-linux" "/usr/bin/../lib32/i686-montavista-linux/gcc/i686-montavista-linux" "/usr/bin/../lib32/i686-montavista-linux" "/usr/bin/../lib32/gcc/i386-unknown-freebsd9.2" "/usr/bin/../lib32/i386-unknown-freebsd9.2/gcc/i386-unknown-freebsd9.2" "/usr/bin/../lib32/i386-unknown-freebsd9.2" "/usr/bin/../lib" "/usr/bin/../lib/gcc/i686-linux-gnu" "/usr/bin/../lib/i686-linux-gnu/gcc/i686-linux-gnu" "/usr/bin/../lib/i686-linux-gnu" "/usr/bin/../lib/gcc/i686-pc-linux-gnu" "/usr/bin/../lib/i686-pc-linux-gnu/gcc/i686-pc-linux-gnu" "/usr/bin/../lib/i686-pc-linux-gnu" "/usr/bin/../lib/gcc/i486-linux-gnu" "/usr/bin/../lib/i486-linux-gnu/gcc/i486-linux-gnu" "/usr/bin/../lib/i486-linux-gnu" "/usr/bin/../lib/gcc/i386-linux-gnu" "/usr/bin/../lib/i386-linux-gnu/gcc/i386-linux-gnu" "/usr/bin/../lib/i386-linux-gnu" "/usr/bin/../lib/gcc/i386-redhat-linux6E" "/usr/bin/../lib/i386-redhat-linux6E/gcc/i386-redhat-linux6E" "/usr/bin/../lib/i386-redhat-linux6E" "/usr/bin/../lib/gcc/i686-redhat-linux" "/usr/bin/../lib/i686-redhat-linux/gcc/i686-redhat-linux" "/usr/bin/../lib/i686-redhat-linux" "/usr/bin/../lib/gcc/i586-redhat-linux" "/usr/bin/../lib/i586-redhat-linux/gcc/i586-redhat-linux" "/usr/bin/../lib/i586-redhat-linux" "/usr/bin/../lib/gcc/i386-redhat-linux" "/usr/bin/../lib/i386-redhat-linux/gcc/i386-redhat-linux" "/usr/bin/../lib/i386-redhat-linux" "/usr/bin/../lib/gcc/i586-suse-linux" "/usr/bin/../lib/i586-suse-linux/gcc/i586-suse-linux" "/usr/bin/../lib/i586-suse-linux" "/usr/bin/../lib/gcc/i486-slackware-linux" "/usr/bin/../lib/i486-slackware-linux/gcc/i486-slackware-linux" "/usr/bin/../lib/i486-slackware-linux" "/usr/bin/../lib/gcc/i686-montavista-linux" "/usr/bin/../lib/i686-montavista-linux/gcc/i686-montavista-linux" "/usr/bin/../lib/i686-montavista-linux" "/usr/bin/../lib/gcc/i386-unknown-freebsd9.2" "/usr/bin/../lib/i386-unknown-freebsd9.2/gcc/i386-unknown-freebsd9.2" "/usr/bin/../lib/i386-unknown-freebsd9.2" [...] "/System/Library/CoreServices/SystemVersion.plist" [...] From letz at grame.fr Thu Jul 18 11:20:15 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Thu, 18 Jul 2013 20:20:15 +0200 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> Message-ID: <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> Le 18 juil. 2013 à 19:07, Eli Friedman a écrit : > On Thu, Jul 18, 2013 at 9:07 AM, Stéphane Letz wrote: >> Hi, >> >> Our DSL LLVM IR emitted code (optimized with -O3 kind of IR ==> IR passes) runs slower when executed with the LLVM 3.3 JIT, compared to what we had with LLVM 3.1. What could be the reason? >> >> I tried to play with TargetOptions without any success… >> >> Here is the kind of code we use to allocate the JIT: >> >> EngineBuilder builder(fResult->fModule); >> builder.setOptLevel(CodeGenOpt::Aggressive); >> builder.setEngineKind(EngineKind::JIT); >> builder.setUseMCJIT(true); >> builder.setCodeModel(CodeModel::JITDefault); >> builder.setMCPU(llvm::sys::getHostCPUName()); >> >> TargetOptions targetOptions; >> targetOptions.NoFramePointerElim = true; >> targetOptions.LessPreciseFPMADOption = true; >> targetOptions.UnsafeFPMath = true; >> targetOptions.NoInfsFPMath = true; >> targetOptions.NoNaNsFPMath = true; >> targetOptions.GuaranteedTailCallOpt = true; >> >> builder.setTargetOptions(targetOptions); >> >> TargetMachine* tm = builder.selectTarget(); >> >> fJIT = builder.create(tm); >> if (!fJIT) { >> return false; >> } >> …. >> >> Any idea? > > It's hard to say much without seeing the specific IR and the code > generated from that IR. > > -Eli Our language can do either: 1) DSL ==> C/C++ ===> clang/gcc ===> exec code or 1) DSL ==> LLVM IR ===> (optimisation passes) ==> LLVM IR ==> LLVM JIT ==> exex code 1) and 2) where running at same speed with LLVM 3.1, but 2) is now slower with LLVM 3.3 I compared the LLVM IR that is generated by the 2) chain *after* the optimization passes, with the one that is generated with 1) and clang -emit-llvm -03 with the pure C input. The two are the same. So my conclusion what that the way we are activating the JIT is no more correct in 3.3, or we are missing new steps that have to be done in JIT? Stéphane Letz From atrick at apple.com Thu Jul 18 11:28:01 2013 From: atrick at apple.com (Andrew Trick) Date: Thu, 18 Jul 2013 11:28:01 -0700 Subject: [LLVMdev] Nested Loop Unrolling In-Reply-To: <3A95D635-6EAA-484D-965E-A9DDFA1940D4@yahoo.com> References: <3A95D635-6EAA-484D-965E-A9DDFA1940D4@yahoo.com> Message-ID: <3703DEAD-9B8A-4093-85AC-47D4A99BA0D3@apple.com> On Jul 17, 2013, at 5:11 PM, Ali Javadi wrote: > Hi, > > In LLVM (using the opt tool), is it possible to force a nested loop be unrolled entirely? Something like a pass option? > I have a nested loop with depth of 4, and all trip counts are known at compile time, but so far I've only been able to do this by 4 invocations of the -loop-simplify, -loop-rotate, -loop-unroll passes. This has to do with the order that the LoopUnrollPass is applied to the loops. The unroll pass itself wouldn’t be able to control it. The loop tree should be processed bottom-up, so nested loops should be fully unrolled. If you’re not seeing that for an obvious case, please file a bug. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.kaylor at intel.com Thu Jul 18 12:05:39 2013 From: andrew.kaylor at intel.com (Kaylor, Andrew) Date: Thu, 18 Jul 2013 19:05:39 +0000 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> Message-ID: <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> I understand you to mean that you have isolated the actual execution time as your point of comparison, as opposed to including runtime loading and so on. Is this correct? One thing that changed between 3.1 and 3.3 is that MCJIT no longer compiles the module during the engine creation process but instead waits until either a function pointer is requested or finalizeObject is called. I would guess that you have taken that into account in your measurement technique, but it seemed worth mentioning. What architecture/OS are you testing? With LLVM 3.3 you can register a JIT event listener (using ExecutionEngine::RegisterJITEventListener) that MCJIT will call with a copy of the actual object image that gets generated. You could then write that image to a file as a basis for comparing the generated code. You can find a reference implementation of the interface in lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp. -Andy -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Stéphane Letz Sent: Thursday, July 18, 2013 11:20 AM To: Eli Friedman Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] LLVM 3.3 JIT code speed Le 18 juil. 2013 à 19:07, Eli Friedman a écrit : > On Thu, Jul 18, 2013 at 9:07 AM, Stéphane Letz wrote: >> Hi, >> >> Our DSL LLVM IR emitted code (optimized with -O3 kind of IR ==> IR passes) runs slower when executed with the LLVM 3.3 JIT, compared to what we had with LLVM 3.1. What could be the reason? >> >> I tried to play with TargetOptions without any success. >> >> Here is the kind of code we use to allocate the JIT: >> >> EngineBuilder builder(fResult->fModule); >> builder.setOptLevel(CodeGenOpt::Aggressive); >> builder.setEngineKind(EngineKind::JIT); >> builder.setUseMCJIT(true); >> builder.setCodeModel(CodeModel::JITDefault); >> builder.setMCPU(llvm::sys::getHostCPUName()); >> >> TargetOptions targetOptions; >> targetOptions.NoFramePointerElim = true; >> targetOptions.LessPreciseFPMADOption = true; >> targetOptions.UnsafeFPMath = true; >> targetOptions.NoInfsFPMath = true; >> targetOptions.NoNaNsFPMath = true; >> targetOptions.GuaranteedTailCallOpt = true; >> >> builder.setTargetOptions(targetOptions); >> >> TargetMachine* tm = builder.selectTarget(); >> >> fJIT = builder.create(tm); >> if (!fJIT) { >> return false; >> } >> .. >> >> Any idea? > > It's hard to say much without seeing the specific IR and the code > generated from that IR. > > -Eli Our language can do either: 1) DSL ==> C/C++ ===> clang/gcc ===> exec code or 1) DSL ==> LLVM IR ===> (optimisation passes) ==> LLVM IR ==> LLVM JIT ==> exex code 1) and 2) where running at same speed with LLVM 3.1, but 2) is now slower with LLVM 3.3 I compared the LLVM IR that is generated by the 2) chain *after* the optimization passes, with the one that is generated with 1) and clang -emit-llvm -03 with the pure C input. The two are the same. So my conclusion what that the way we are activating the JIT is no more correct in 3.3, or we are missing new steps that have to be done in JIT? Stéphane Letz _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From peter at pcc.me.uk Thu Jul 18 12:13:45 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Thu, 18 Jul 2013 12:13:45 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: <20130718191345.GA17545@pcc.me.uk> On Wed, Jul 17, 2013 at 07:50:58PM -0700, Sean Silva wrote: > On Wed, Jul 17, 2013 at 6:06 PM, Peter Collingbourne wrote: > > > Hi, > > > > I would like to propose that we introduce a mechanism in IR to allow > > arbitrary data to be stashed before a function body. The purpose of > > this would be to allow additional data about a function to be looked > > up via a function pointer. Two use cases come to mind: > > > > 1) We'd like to be able to use UBSan to check that the type of the > > function pointer of an indirect function call matches the type of > > the function being called. This can't really be done efficiently > > without storing type information near the function. > > > > How efficient does it have to be? Have some alternatives already proven to > be "too slow"? (e.g. a binary search into a sorted table) This has admittedly not been measured. It depends on the rate at which the program performs indirect function calls. But given the other use cases for this feature we might as well use it in UBSan as opposed to something which is going to be strictly slower. > > 2) Allowing GHC's tables-next-to-code ABI [1] to be implemented. > > In general, I imagine this feature could be useful for the > > implementation of languages which require runtime metadata for > > each function. > > > > The proposal is that an IR function definition acquires a constant > > operand which contains the data to be emitted immediately before > > the function body (known as the prefix data). To access the data > > for a given function, a program may bitcast the function pointer to > > a pointer to the constant's type. This implies that the IR symbol > > points to the start of the prefix data. > > > > To maintain the semantics of ordinary function calls, the prefix data > > must have a particular format. Specifically, it must begin with a > > sequence of bytes which decode to a sequence of machine instructions, > > valid for the module's target, which transfer control to the point > > immediately succeeding the prefix data, without performing any other > > visible action. This allows the inliner and other passes to reason > > about the semantics of the function definition without needing to > > reason about the prefix data. Obviously this makes the format of the > > prefix data highly target dependent. > > > > I'm not sure that something this target dependent is the right choice. Your > example below suggests that the frontend would then need to know magic to > put "raw" in the instruction stream. Have you considered having the feature > expose just the intent "store this data attached to the function, to be > accessed very quickly", and then have an intrinsic > ("llvm.getfuncdata.i{8,16,32,64}"?) which extracts the data in a > target-dependent way? The problem is that things like UBSan need to be able to understand the instruction stream anyway (to a certain extent). In UBSan's case, determining at runtime whether a function has prefix data depends on a specific signature of instructions at the start of the program. There are a wide variety of signatures that can be used here and I believe we shouldn't try to constrain the frontend author with a signature (at least partly) of our own design. I think that if someone wants a target-independent way of embedding prefix data it should be done as a library on top of the target-dependent facilities provided in IR. One could imagine a set of routines like this: /// Given some constant data, attach valid prefix data. void attachPrefixData(Function *F, Constant *Data); /// Returns an i1 indicating whether prefix data is present for FP. Value *hasPrefixData(Value *FP); /// Returns a pointer to the prefix data for FP. Value *getPrefixDataPointer(Value *FP, Type *DataType); > Forcing clients to embed deep > target-specific-machine-code knowledge in their frontends seems like a step > in the wrong direction for LLVM. Given a set of routines such as the ones described above, I think we can give frontends a choice of whether to do this or not. Besides, LLVM already contains plenty of target-specific information in its IR. Varargs, inline asm, calling conventions, etc. I don't think making all aspects of the IR target-independent should be a worthwhile goal for LLVM. > > This requirement could be relaxed when combined with my earlier symbol > > offset proposal [2] as applied to functions. However, this is outside > > the scope of the current proposal. > > > > Example: > > > > %0 = type <{ i32, i8* }> > > > > define void @f() prefix %0 <{ i32 1413876459, i8* bitcast ({ i8*, i8* }* > > @_ZTIFvvE to i8*) }> { > > ret void > > } > > > > This is an example of something that UBSan might generate on an > > x86_64 machine. It consists of a signature of 4 bytes followed by a > > pointer to the RTTI data for the type 'void ()'. The signature when > > laid out as a little endian 32-bit integer decodes to the instruction > > 'jmp .+0x0c' (which jumps to the instruction immediately succeeding > > the 12-byte prefix) followed by the bytes 'F' and 'T' which identify > > the prefix as a UBSan function type prefix. > > > > Do you know whether OoO CPU's will still attempt to decode the "garbage" in > the instruction stream, even if there is a jump over it? (IIRC they will > decode ahead of the PC and hiccup (but not fault) on garbage). Maybe it > would be better to steganographically encode the value inside the > instruction stream? On x86 you could use 48b8 which only has 2 bytes > overhead for an i64 (putting a move like that, which moves into a > caller-saved register on entry, would effectively be a noop). On the contrary, I think this is a good argument for allowing (not forcing) frontends to encode the prefix data as they please, thus enabling this kind of creativity. > This is some > pretty gnarly target-dependent stuff which seems like it would best be > hidden in the backend (e.g. architectures that have "constant island"-like > passes might want to stash the data in there instead). I think that adding support for things like constant islands is something that can be added incrementally at a later stage. One could consider for example an additional llvm::Function field which specifies the number of bytes that the backend may use at the beginning of the function such that the prefix data may be of any format. (Once this is in place the aforementioned library routines could become relatively trivial.) The backend could use this space to, say, insert a relative branch that skips the prefix data and a first constant island. Thanks, -- Peter From peter at pcc.me.uk Thu Jul 18 12:14:59 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Thu, 18 Jul 2013 12:14:59 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: <20130718191459.GB17545@pcc.me.uk> On Thu, Jul 18, 2013 at 12:45:32PM -0400, Jevin Sweval wrote: > On Wed, Jul 17, 2013 at 9:06 PM, Peter Collingbourne wrote: > > > > To maintain the semantics of ordinary function calls, the prefix data > > must have a particular format. Specifically, it must begin with a > > sequence of bytes which decode to a sequence of machine instructions, > > valid for the module's target, which transfer control to the point > > immediately succeeding the prefix data, without performing any other > > visible action. This allows the inliner and other passes to reason > > about the semantics of the function definition without needing to > > reason about the prefix data. Obviously this makes the format of the > > prefix data highly target dependent. > > > What if the prefix data was stored before the start of the function > code? The function's symbol will point to the code just as before, > eliminating the need to have instructions that skip the prefix data. > > It would look something like: > | Prefix Data ... (variable length) | Prefix Data Length (fixed length > [32 bits?]) | Function code .... | > > ^ function symbol points here (function code) > > I hope the simple ASCII art makes it through my mail client. > > To access the data, you do > > prefix_data = function_ptr - sizeof(prefix_length) - prefix_length A similar scheme is described in the next paragraph of my email: > > This requirement could be relaxed when combined with my earlier symbol > > offset proposal [2] as applied to functions. However, this is outside > > the scope of the current proposal. Unfortunately, this won't work for UBSan, as it needs to be able to take an arbitrary function pointer and determine whether the prefix data is present. If the function lives at the beginning of a segment boundary and does not have prefix data a segfault may occur when attempting to access the prefix data. It should definitely work for GHC though (and is how I understand the tables-next-to-code ABI to be implemented in its non-LLVM backend). Thanks, -- Peter From t.p.northover at gmail.com Thu Jul 18 12:18:06 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 18 Jul 2013 20:18:06 +0100 Subject: [LLVMdev] Compile Error SVN In-Reply-To: References: Message-ID: Hi Brian, > Did I do something wrong? Are you sure your clang and LLVM are at the same revision? That line from FileSystemStatCache.cpp was added a few days ago, on the same day that the function itself seems to have been added (16th Jul) so if your LLVM hasn't been updated things won't work. Incidentally, I can strongly recommend "git blame" to tell you reasonably accurately where problematic lines come from (and it looks like "svn blame" exists too). Tim. From tghardin1 at catamount.wcu.edu Thu Jul 18 12:23:19 2013 From: tghardin1 at catamount.wcu.edu (Tyler Hardin) Date: Thu, 18 Jul 2013 15:23:19 -0400 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: As much as I like this idea for it's use in languages with type systems like Haskell and Scheme, this proposal would limit LLVM to non-Harvard architectures. That's generally a really small minority of all processors, but it would mean there could never be a clang-avr. An alternative you could use is, instead of using the function pointer as the variable where you are referring to a function, you could have the variable be a pointer to a static struct with the data and the actual function pointer. Basically, it's like how static class variables as handled in C++. I don't know LLVM IR, so I'll use C to explain. Instead of this: void func(void){} int main(){ func(); return 0; } You could do this: void func(void){} /* You have to initialize this at compile time. */ struct { char* data; int len; void (*ptr)(void) = func; } func_data; int main(){ func_data.ptr(); return 0; } On Jul 18, 2013 12:47 PM, "Jevin Sweval" wrote: > On Wed, Jul 17, 2013 at 9:06 PM, Peter Collingbourne > wrote: > > > > To maintain the semantics of ordinary function calls, the prefix data > > must have a particular format. Specifically, it must begin with a > > sequence of bytes which decode to a sequence of machine instructions, > > valid for the module's target, which transfer control to the point > > immediately succeeding the prefix data, without performing any other > > visible action. This allows the inliner and other passes to reason > > about the semantics of the function definition without needing to > > reason about the prefix data. Obviously this makes the format of the > > prefix data highly target dependent. > > > What if the prefix data was stored before the start of the function > code? The function's symbol will point to the code just as before, > eliminating the need to have instructions that skip the prefix data. > > It would look something like: > | Prefix Data ... (variable length) | Prefix Data Length (fixed length > [32 bits?]) | Function code .... | > > ^ function symbol points here (function code) > > I hope the simple ASCII art makes it through my mail client. > > To access the data, you do > > prefix_data = function_ptr - sizeof(prefix_length) - prefix_length > > Cheers, > Jevin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at pcc.me.uk Thu Jul 18 12:30:59 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Thu, 18 Jul 2013 12:30:59 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: <20130718193059.GA32482@pcc.me.uk> On Thu, Jul 18, 2013 at 03:23:19PM -0400, Tyler Hardin wrote: > As much as I like this idea for it's use in languages with type systems > like Haskell and Scheme, this proposal would limit LLVM to non-Harvard > architectures. That's generally a really small minority of all processors, > but it would mean there could never be a clang-avr. Not really. It just would mean that the prefix data feature would not exist (or at least would not be usable) on such architectures. > An alternative you could use is, instead of using the function pointer as > the variable where you are referring to a function, you could have the > variable be a pointer to a static struct with the data and the actual > function pointer. Basically, it's like how static class variables as > handled in C++. > > I don't know LLVM IR, so I'll use C to explain. > > Instead of this: > > void func(void){} > > int main(){ > func(); > return 0; > } > > You could do this: > > void func(void){} > > /* You have to initialize this at compile time. */ > struct { > char* data; > int len; > void (*ptr)(void) = func; > } func_data; > > int main(){ > func_data.ptr(); > return 0; > } You could certainly use something like this to implement runtime function metadata on Harvard architectures (the existing LLVM GHC backend must already do something like this), and (optionally, as an optimisation) prefix data on all other architectures. Thanks, -- Peter From jevinsweval at gmail.com Thu Jul 18 12:55:40 2013 From: jevinsweval at gmail.com (Jevin Sweval) Date: Thu, 18 Jul 2013 15:55:40 -0400 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: On Thu, Jul 18, 2013 at 3:23 PM, Tyler Hardin wrote: > As much as I like this idea for it's use in languages with type systems like > Haskell and Scheme, this proposal would limit LLVM to non-Harvard > architectures. That's generally a really small minority of all processors, > but it would mean there could never be a clang-avr. Are there any Harvard architectures that clang supports that doesn't provide instructions to read program memory? AVR has the LPM instruction. Cheers, Jevin From silvas at purdue.edu Thu Jul 18 14:29:14 2013 From: silvas at purdue.edu (Sean Silva) Date: Thu, 18 Jul 2013 14:29:14 -0700 Subject: [LLVMdev] [cfe-dev] [RFC] add Function Attribute to disable optimization In-Reply-To: References: <51BF7516.4070800@mxc.ca> Message-ID: On Thu, Jul 18, 2013 at 8:23 AM, wrote: > So.. > I have investigated more on how a new function attribute to disable > optimization on a per-function basis could be implemented. > At the current state, with the lack of specific support from the pass > managers I found two big problems when trying to implement a prototype > implementation of the new attribute. > > Here are the problems found: > 1) It is not safe to disable some transform passes in the backend. > It looks like there are some unwritten dependences between passes and > disrupting the sequence of passes to run may result in unexpected crashes > and/or assertion failures; > This sounds like a bug. It's probably worth bringing up as its own discussion on llvmdev if it is extremely prevalent, or file PR's (or send patches fixing it!) if it is just a few isolated cases. > 2) The fact that pass managers are not currently designed to support > per-function optimization makes it difficult to find a reasonable way to > implement this new feature. > > About point 2. the Idea that came in my mind consisted in making passes > aware of the 'noopt' attribute. > In my experiment: > - I added a virtual method called 'mustAlwaysRun' in class Pass that > 'returns true if it is not safe to disable this pass'. > If a pass does not override the default implementation of that method, > then by default it will always return true (i.e. the pass "must > always run" pass even when attribute 'noopt' is specified). > - I then redefined in override that method on all the optimization passes > that could have been safely turned off when attribute noopt was present. > In my experiment, I specifically didn't disable Module Passes; > - Then I modified the 'doInitialize()' 'run*()' and 'doFinalize' methods > in Pass Manger to check for both the presence of attribute noopt AND the > value returned by method 'mustAlwaysRun' called on the current pass > instance. > > That experiment seemed to "work" on a few tests and benchmarks. > However: > a) 'noopt' wouldn't really imply no optimization, since not all codegen > optimization passes can be safely disabled. As a result, the assembly > produced for noopt functions had few differences with respect to the > assembly generated for the same functions at -O0; > b) I don't particularly like the idea of making passes "aware" of the > 'noopt' attribute. However, I don't know if there is a reasonable way to > implement the noopt attribute without having to re-design how pass > managers work. > A redesign of the pass manager has been on the table for a while and the need seems more pressing daily. Definitely make sure that this use case is brought up in any design discussions for the new pass manager. -- Sean Silva > c) Because of a. and b., I am concerned that a change like the one > described above won't be accepted. If so however, I would be really > interested in the feedback from the community. Maybe there are better ways > to implement 'noopt' which I don't know/didn't think about? > > As I said, I am not very happy with the proposed solution and any feedback > would be really appreciated at this point. > > By the way, here is how I thought the 'noopt' proposal could have been > contributed in terms of patches: > > [LLVM IR][Patch 1] > ================ > This patch extends the IR adding a new attribute called 'noopt'. > Below, is a sequence of steps which describes how to implement this patch. > > 1) Add a definition for attribute 'noopt' in File llvm/IR/Attribute.h; > 2) Teach how attribute 'noopt' should be encoded and also how to print it > out > as a string value (File lib/IR/Attributes.cpp); > 2b) Add a new enum value for the new attribute in enum LLVMAttribute > (File "include/llvm-c/Core.h"); > 3) The new attribute is a function attribute; > Teach the verifier pass that 'noopt' is a function attribute; > Add checks in method VerifyAttributeTypes() (File lib/IR/Verifier.cpp): > * NoOpt is a function-only attribute; > * Assert if NoOpt is used in the same context as alwaysinline; > * Assert if NoOpt is used in the same context as OptimizeForSize > (needed?); > * Assert if NoOpt is used in the same context as MinSize (needed?). > 4) Add a LLVM test in test/Feature to verify that we correctly disassemble > the new function attribute (see for example file cold.ll); > 5) Teach the AsmParser how to parse the new attribute: > * Add a new token for the new attribute noopt; > * Add rules to parse the new token; > 6) Add a description of the new attribute in "docs/LangRef.rst"; > > [LLVM][Opt][Patch 2] > ================== > This patch implements the required changes to passes and pass managers. > Below, is a sequence of steps which describes how to implement this patch. > > 1) Make the new inliner aware of the new flag. > * In lib/Transforms/IPO/Inliner.cpp: > ** do not inline the callee if it is not always_inline and the caller > is marked 'noopt'. > * No other changes are required since 'noopt' already implies 'noinline'. > 2) Tell the pass manager which transform passes can be safely disabled > with 'noopt'. > > [CLANG][Patch 3] > =============== > This patch teaches clang how to parse and generate code for functions that > are marked with attribute 'noopt'. > > 1) Lex > * Add a new token for the 'noopt' keyword. > * That keyword is for a function attribute. > 2) Sema > * Add a rule to handle the case where noopt is passed as function > attribute. > * check that the attribute does not take extra arguments. > * check that the attribute is associated to a function declaration. > * Add the attribute to the IR Set of Attributes. > 3) CodeGen > * noopt implies 'noinline. > * noopt always wins over always_inline > * noopt does not win over 'naked': naked functions only contain asm > statements. This attribute is only valid for ARM, AVX, MCORE, RL78, RX > and > SPU to indicate that the specified function does not need > prologue/epilogue > sequence generated by the compiler. (NOTE: this constraint can be > removed). > 4) Add clang tests: > * in test/Sema: > ** Verify that noopt only applies to functions. (-cc1 -fsyntax-only > -verify) > * in test/CodeGen: > ** Check that noopt implies noinline > ** Check combinations of noopt and noinline and always_inline > > > Andrea Di Biagio > SN Systems - Sony Computer Entertainment Group. > > Andrea DiBiagio/SN R&D/BS/UK/SCEE wrote on 25/06/2013 15:20:12: > > > From: Andrea DiBiagio/SN R&D/BS/UK/SCEE > > To: Nick Lewycky > > Cc: cfe-dev at cs.uiuc.edu, llvmdev at cs.uiuc.edu > > Date: 25/06/2013 15:20 > > Subject: Re: [LLVMdev] [RFC] add Function Attribute to disable > optimization > > > > Hi Nick, > > > > > From: Nick Lewycky > > > > This proposal is to create a new function-level attribute which > would tell > > > > the compiler to not to perform any optimizing transformations on the > > > > specified function. > > > > > > What about module passes? Do you want to disable all module passes in > a > > > TU which contains a single one of these? I'll be unhappy if we need to > > > > litter checks throughout the module passes that determine whether a > > > given instruction is inside an unoptimizable function or not. Saying > > > that module passes are exempt from checking the 'noopt' attribute is > > > fine to me, but then somebody needs to know how to module passes (and > > > users may be surprised to discover that adding such an annotation to > one > > > function will cause seemingly-unrelated functions to become less > optimized). > > > Right, module passes are a difficult case. > > I understand your point. I think ignoring the `noopt' attribute (or > > whatever we want to call it) may be the best approach in this case: > > it avoid the problems you describe but should still be sufficient > > for the purposes we care about. I am currently studying the module > > passes in more details to be certain about this. > > > Thanks for the useful feedback, > > Andrea Di Biagio > > SN Systems - Sony Computer Entertainment Group > > > > > > The use-case is to be able to selectively disable optimizations when > > > > debugging a small number of functions in a compilation unit to > provide an > > > > -O0-like quality of debugging in cases where compiling the whole > unit at > > > > anything less than full optimization would make the program run too > > > > slowly. A useful secondary-effect of this feature would be to allow > users > > > > to temporarily work-around optimization bugs in LLVM without having > to > > > > reduce the optimization level for the whole compilation unit, > however we > > > > do not consider this the most important use-case. > > > > > > > > Our suggestion for the name for this attribute is "optnone" which > seems to > > > > be in keeping with the existing "optsize" attribute, although it > could > > > > equally be called "noopt" or something else entirely. It would be > exposed > > > > to Clang users through __attribute__((optnone)) or [[optnone]]. > > > > > > > > I would like to discuss this proposal with the rest of the community > to > > > > share opinions and have feedback on this. > > > > > > > > =================================================== > > > > Interactions with the existing function attributes: > > > > > > > > LLVM allows to decorate functions with 'noinline', alwaysinline' and > > > > 'inlinehint'. We think that it makes sense for 'optnone' to > implicitly > > > > imply 'noinline' (at least from a user's point of view) and > therefore > > > > 'optnone' should be considered incompatible with 'alwaysinline' and > > > > 'inlinehint'. > > > > > > > > Example: > > > > __attribute__((optnone, always_inline)) > > > > void foo() { ... } > > > > > > > > In this case we could make 'optnone' override 'alwaysinline'. The > effect > > > > would be that 'alwaysinline' wouldn't appear in the IR if 'optnone' > is > > > > specified. > > > > > > > > Under the assumption that 'optnone' implies 'noinline', other things > that > > > > should be taken into account are: > > > > 1) functions marked as 'optnone' should never be considered as > potential > > > > candidates for inlining; > > > > 2) the inliner shouldn't try to inline a function if the call site > is in a > > > > 'optnone' function. > > > > > > > > Point 1 can be easily achieved by simply pushing attribute > 'noinline' on > > > > the list of function attributes if 'optnone' is used. > > > > point 2 however would probably require to teach the Inliner about > > > > 'optnone' and how to deal with it. > > > > > > > > As in the case of 'alwaysinline' and 'inlinehint', I think 'optnone' > > > > should also override 'optsize'/'minsize'. > > > > > > > > Last (but not least), implementing 'optnone' would still require > changes > > > > in how optimizations are run on functions. This last part is > probably the > > > > hardest part since the current optimizer does not allow the level of > > > > flexibility required by 'optnone'. It seems it would either require > some > > > > modifications to the Pass Manager or we would have to make > individual > > > > passes aware of the attribute. Neither of these solutions seem > > > > particularly attractive to me, so I'm open to any suggestions! > > > > > > > > Thanks, > > > > Andrea Di Biagio > > > > SN Systems - Sony Computer Entertainment Group > > > > > > > > > > > > > ********************************************************************** > > > > This email and any files transmitted with it are confidential and > intended > > > > solely for the use of the individual or entity to whom they are > addressed. > > > > If you have received this email in error please notify > postmaster at scee.net > > > > This footnote also confirms that this email message has been checked > for > > > > all known viruses. > > > > Sony Computer Entertainment Europe Limited > > > > Registered Office: 10 Great Marlborough Street, London W1F 7LP, > United > > > > Kingdom > > > > Registered in England: 3277793 > > > > > ********************************************************************** > > > > > > > > P Please consider the environment before printing this e-mail > > > > _______________________________________________ > > > > LLVM Developers mailing list > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > > > > > > ********************************************************************** > > This email and any files transmitted with it are confidential and > > intended solely for the use of the individual or entity to whom they > > are addressed. If you have received this email in error please > > notify postmaster at scee.net > > This footnote also confirms that this email message has been checked > > for all known viruses. > > Sony Computer Entertainment Europe Limited > > Registered Office: 10 Great Marlborough Street, London W1F 7LP, United > Kingdom > > Registered in England: 3277793 > > ********************************************************************** > > > > P Please consider the environment before printing this e-mail > > ********************************************************************** > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you have received this email in error please notify postmaster at scee.net > This footnote also confirms that this email message has been checked for > all known viruses. > Sony Computer Entertainment Europe Limited > Registered Office: 10 Great Marlborough Street, London W1F 7LP, United > Kingdom > Registered in England: 3277793 > ********************************************************************** > > P Please consider the environment before printing this e-mail > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Thu Jul 18 14:51:03 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Thu, 18 Jul 2013 23:51:03 +0200 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> Message-ID: <37D17E8A-300B-40EC-B96F-B12044C28D3A@grame.fr> Le 18 juil. 2013 à 21:05, "Kaylor, Andrew" a écrit : > I understand you to mean that you have isolated the actual execution time as your point of comparison, as opposed to including runtime loading and so on. Is this correct? We are testing actual execution time yes : time used in a given JIT compiled function. > > > One thing that changed between 3.1 and 3.3 is that MCJIT no longer compiles the module during the engine creation process but instead waits until either a function pointer is requested or finalizeObject is called. I would guess that you have taken that into account in your measurement technique, but it seemed worth mentioning. OK, so I guess our testing is then correct since we are testing actual execution time of the function pointer. > > > What architecture/OS are you testing? 64 bits OSX (10.8.4) > > With LLVM 3.3 you can register a JIT event listener (using ExecutionEngine::RegisterJITEventListener) that MCJIT will call with a copy of the actual object image that gets generated. You could then write that image to a file as a basis for comparing the generated code. You can find a reference implementation of the interface in lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp. Thanks I'll have a look. > > -Andy > Stéphane From silvas at purdue.edu Thu Jul 18 14:59:42 2013 From: silvas at purdue.edu (Sean Silva) Date: Thu, 18 Jul 2013 14:59:42 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: <20130718191345.GA17545@pcc.me.uk> References: <20130718010609.GA17472@pcc.me.uk> <20130718191345.GA17545@pcc.me.uk> Message-ID: On Thu, Jul 18, 2013 at 12:13 PM, Peter Collingbourne wrote: > On Wed, Jul 17, 2013 at 07:50:58PM -0700, Sean Silva wrote: > > On Wed, Jul 17, 2013 at 6:06 PM, Peter Collingbourne >wrote: > > > > > Hi, > > > > > > I would like to propose that we introduce a mechanism in IR to allow > > > arbitrary data to be stashed before a function body. The purpose of > > > this would be to allow additional data about a function to be looked > > > up via a function pointer. Two use cases come to mind: > > > > > > 1) We'd like to be able to use UBSan to check that the type of the > > > function pointer of an indirect function call matches the type of > > > the function being called. This can't really be done efficiently > > > without storing type information near the function. > > > > > > > How efficient does it have to be? Have some alternatives already proven > to > > be "too slow"? (e.g. a binary search into a sorted table) > > This has admittedly not been measured. It depends on the rate at > which the program performs indirect function calls. But given the > other use cases for this feature So far you only seem to have presented the GHC ABI use case (and see just below about UBSan). Do you know of any other use cases? > we might as well use it in UBSan as > opposed to something which is going to be strictly slower. > Below you use UBSan's use case as a motivating example, which seems incongruous to this "might as well use it in UBSan" attitude. Without having evaluated alternatives as being "too slow" for UBSan, I don't think that UBSan's use case should be used to drive this proposal. > > > > 2) Allowing GHC's tables-next-to-code ABI [1] to be implemented. > > > In general, I imagine this feature could be useful for the > > > implementation of languages which require runtime metadata for > > > each function. > > > > > > The proposal is that an IR function definition acquires a constant > > > operand which contains the data to be emitted immediately before > > > the function body (known as the prefix data). To access the data > > > for a given function, a program may bitcast the function pointer to > > > a pointer to the constant's type. This implies that the IR symbol > > > points to the start of the prefix data. > > > > > > To maintain the semantics of ordinary function calls, the prefix data > > > must have a particular format. Specifically, it must begin with a > > > sequence of bytes which decode to a sequence of machine instructions, > > > valid for the module's target, which transfer control to the point > > > immediately succeeding the prefix data, without performing any other > > > visible action. This allows the inliner and other passes to reason > > > about the semantics of the function definition without needing to > > > reason about the prefix data. Obviously this makes the format of the > > > prefix data highly target dependent. > > > > > > > I'm not sure that something this target dependent is the right choice. > Your > > example below suggests that the frontend would then need to know magic to > > put "raw" in the instruction stream. Have you considered having the > feature > > expose just the intent "store this data attached to the function, to be > > accessed very quickly", and then have an intrinsic > > ("llvm.getfuncdata.i{8,16,32,64}"?) which extracts the data in a > > target-dependent way? > > The problem is that things like UBSan need to be able to understand > the instruction stream anyway (to a certain extent). In UBSan's case, > determining at runtime whether a function has prefix data depends on > a specific signature of instructions at the start of the program. > There are a wide variety of signatures that can be used here and > I believe we shouldn't try to constrain the frontend author with a > signature (at least partly) of our own design. > > I think that if someone wants a target-independent way of > embedding prefix data it should be done as a library on top of the > target-dependent facilities provided in IR. One could imagine a set > of routines like this: > > /// Given some constant data, attach valid prefix data. > void attachPrefixData(Function *F, Constant *Data); > > /// Returns an i1 indicating whether prefix data is present for FP. > Value *hasPrefixData(Value *FP); > > /// Returns a pointer to the prefix data for FP. > Value *getPrefixDataPointer(Value *FP, Type *DataType); > > > Forcing clients to embed deep > > target-specific-machine-code knowledge in their frontends seems like a > step > > in the wrong direction for LLVM. > > Given a set of routines such as the ones described above, I think we > can give frontends a choice of whether to do this or not. Besides, > LLVM already contains plenty of target-specific information in its IR. > Varargs, inline asm, calling conventions, etc. I don't think making > all aspects of the IR target-independent should be a worthwhile goal > for LLVM. > I don't have an issue with target-dependent things per se; I just think that they should be given a bit more thought and not added unless existing mechanisms are insufficient. For example, could this be implemented as a late IR pass that adds a piece of inline-asm to the beginning of the function? -- Sean Silva > > > > This requirement could be relaxed when combined with my earlier symbol > > > offset proposal [2] as applied to functions. However, this is outside > > > the scope of the current proposal. > > > > > > Example: > > > > > > %0 = type <{ i32, i8* }> > > > > > > define void @f() prefix %0 <{ i32 1413876459, i8* bitcast ({ i8*, i8* > }* > > > @_ZTIFvvE to i8*) }> { > > > ret void > > > } > > > > > > This is an example of something that UBSan might generate on an > > > x86_64 machine. It consists of a signature of 4 bytes followed by a > > > pointer to the RTTI data for the type 'void ()'. The signature when > > > laid out as a little endian 32-bit integer decodes to the instruction > > > 'jmp .+0x0c' (which jumps to the instruction immediately succeeding > > > the 12-byte prefix) followed by the bytes 'F' and 'T' which identify > > > the prefix as a UBSan function type prefix. > > > > > > > Do you know whether OoO CPU's will still attempt to decode the "garbage" > in > > the instruction stream, even if there is a jump over it? (IIRC they will > > decode ahead of the PC and hiccup (but not fault) on garbage). Maybe it > > would be better to steganographically encode the value inside the > > instruction stream? On x86 you could use 48b8 which only has 2 > bytes > > overhead for an i64 (putting a move like that, which moves into a > > caller-saved register on entry, would effectively be a noop). > > On the contrary, I think this is a good argument for allowing > (not forcing) frontends to encode the prefix data as they please, > thus enabling this kind of creativity. > > > This is some > > pretty gnarly target-dependent stuff which seems like it would best be > > hidden in the backend (e.g. architectures that have "constant > island"-like > > passes might want to stash the data in there instead). > > I think that adding support for things like constant islands is > something that can be added incrementally at a later stage. One could > consider for example an additional llvm::Function field which specifies > the number of bytes that the backend may use at the beginning of the > function such that the prefix data may be of any format. (Once this > is in place the aforementioned library routines could become relatively > trivial.) The backend could use this space to, say, insert a relative > branch that skips the prefix data and a first constant island. > > Thanks, > -- > Peter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From callum.a.rogers at gmail.com Thu Jul 18 16:23:37 2013 From: callum.a.rogers at gmail.com (Callum Rogers) Date: Fri, 19 Jul 2013 00:23:37 +0100 Subject: [LLVMdev] Windows Binaries for 3.3 - I've built some (including shared library) Message-ID: <51E878F9.5040602@gmail.com> I've seen a lot people trying to find Windows binaries/tripping up on the Windows compilation process recently which has not been helped by the releases part of the website still lacking Windows binaries. So I went and built some and put them in this repo: https://github.com/CRogers/LLVM-Windows-Binaries/releases, although they are built using VC++ instead of mingw32 (which may be better, due to fewer dependencies). It even has the elusive Windows shared library that is officially unsupported as part of the build process but is required for bindings that use the C API (such as the fantastic llvm-fs). Enjoy! I am considering compiling clang for Windows too, if I have the time and there is interest. Cheers, Callum From ruiu at google.com Thu Jul 18 16:29:59 2013 From: ruiu at google.com (Rui Ueyama) Date: Thu, 18 Jul 2013 16:29:59 -0700 Subject: [LLVMdev] Debugging buildbot failure Message-ID: Hi LLVMdev, My recent commit r186623 caused buildbots for s390-linux and ppc64-linux to fail. I rolled back that commit, and I'm trying to fix to re-submit. Is there any good way to debug the issue which does not occur on my local x86-64 machine? I don't have S390 nor PPC64 machines. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Thu Jul 18 16:29:56 2013 From: rkotler at mips.com (reed kotler) Date: Thu, 18 Jul 2013 16:29:56 -0700 Subject: [LLVMdev] issues for mac os building llvm? Message-ID: <51E87A74.9040308@mips.com> I built llvm and clang on my home mac which has just a normal mac os file system and everything seem to build just fine. Are there any requirements for needing linux style upper/lowercase file systems for llvm/clang tool chains? From eli.friedman at gmail.com Thu Jul 18 16:38:41 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 18 Jul 2013 16:38:41 -0700 Subject: [LLVMdev] Debugging buildbot failure In-Reply-To: References: Message-ID: On Thu, Jul 18, 2013 at 4:29 PM, Rui Ueyama wrote: > Hi LLVMdev, > > My recent commit r186623 caused buildbots for s390-linux and ppc64-linux to > fail. I rolled back that commit, and I'm trying to fix to re-submit. > > Is there any good way to debug the issue which does not occur on my local > x86-64 machine? I don't have S390 nor PPC64 machines. If it's failing on S390 and PPC64 in particular, your code probably accidentally depends on the host endianness. (Looking at the commit in question seems to confirm that.) -Eli From eli.friedman at gmail.com Thu Jul 18 16:39:34 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 18 Jul 2013 16:39:34 -0700 Subject: [LLVMdev] issues for mac os building llvm? In-Reply-To: <51E87A74.9040308@mips.com> References: <51E87A74.9040308@mips.com> Message-ID: On Thu, Jul 18, 2013 at 4:29 PM, reed kotler wrote: > I built llvm and clang on my home mac which has just a normal mac os file > system and everything seem to build just fine. > > Are there any requirements for needing linux style upper/lowercase file > systems for llvm/clang tool chains? Umm, are you unhappy that it works? :) -Eli From criswell at illinois.edu Thu Jul 18 16:41:59 2013 From: criswell at illinois.edu (John Criswell) Date: Thu, 18 Jul 2013 18:41:59 -0500 Subject: [LLVMdev] issues for mac os building llvm? In-Reply-To: <51E87A74.9040308@mips.com> References: <51E87A74.9040308@mips.com> Message-ID: <51E87D47.7050105@illinois.edu> On 7/18/13 6:29 PM, reed kotler wrote: > I built llvm and clang on my home mac which has just a normal mac os > file system and everything seem to build just fine. > > Are there any requirements for needing linux style upper/lowercase > file systems for llvm/clang tool chains? In my experience, no. -- John T. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ruiu at google.com Thu Jul 18 16:51:29 2013 From: ruiu at google.com (Rui Ueyama) Date: Thu, 18 Jul 2013 16:51:29 -0700 Subject: [LLVMdev] Debugging buildbot failure In-Reply-To: References: Message-ID: Yes, it's very likely. It'd be very convenient if we could send a patch to buildbots for testing without actually submitting. On Thu, Jul 18, 2013 at 4:38 PM, Eli Friedman wrote: > On Thu, Jul 18, 2013 at 4:29 PM, Rui Ueyama wrote: > > Hi LLVMdev, > > > > My recent commit r186623 caused buildbots for s390-linux and ppc64-linux > to > > fail. I rolled back that commit, and I'm trying to fix to re-submit. > > > > Is there any good way to debug the issue which does not occur on my local > > x86-64 machine? I don't have S390 nor PPC64 machines. > > If it's failing on S390 and PPC64 in particular, your code probably > accidentally depends on the host endianness. (Looking at the commit > in question seems to confirm that.) > > -Eli > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Thu Jul 18 16:51:09 2013 From: rkotler at mips.com (reed kotler) Date: Thu, 18 Jul 2013 16:51:09 -0700 Subject: [LLVMdev] issues for mac os building llvm? In-Reply-To: References: <51E87A74.9040308@mips.com> Message-ID: <51E87F6D.70900@mips.com> On 07/18/2013 04:39 PM, Eli Friedman wrote: > On Thu, Jul 18, 2013 at 4:29 PM, reed kotler wrote: >> I built llvm and clang on my home mac which has just a normal mac os file >> system and everything seem to build just fine. >> >> Are there any requirements for needing linux style upper/lowercase file >> systems for llvm/clang tool chains? > Umm, are you unhappy that it works? :) > > -Eli I just bought an 11" mac air for when I'm travelling or somewhere else that is not at work and wanted to be able to do llvm work on it. I got the the 8 gig version but with a 128 gig flash drive because it seemed to be enough but if I had to create another partition for llvm, then maybe I should get the 256 gig version. It's not really a work computer but I wanted to be able to do some things on it. From swlin at post.harvard.edu Thu Jul 18 16:58:47 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 18 Jul 2013 16:58:47 -0700 Subject: [LLVMdev] issues for mac os building llvm? In-Reply-To: <51E87F6D.70900@mips.com> References: <51E87A74.9040308@mips.com> <51E87F6D.70900@mips.com> Message-ID: > I just bought an 11" mac air for when I'm travelling or somewhere else that > is not at work and wanted to be able to do llvm work on it. > > I got the the 8 gig version but with a 128 gig flash drive because it seemed > to be enough but if I had to create another partition for llvm, then maybe I > should get the 256 gig version. > > It's not really a work computer but I wanted to be able to do some things on > it. I think a lot of people that work on llvm would be in trouble if it didn't build on OS X :) -Stephen From swlin at post.harvard.edu Thu Jul 18 17:02:10 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Thu, 18 Jul 2013 17:02:10 -0700 Subject: [LLVMdev] Debugging buildbot failure In-Reply-To: References: Message-ID: Even better, someone ought to implement a virtual machine architecture that purposely broke every possible non-portable assumption in unexpected and unpredictable ways, as an acid test environment for all cross-platform projects :) -Stephen On Thu, Jul 18, 2013 at 4:51 PM, Rui Ueyama wrote: > Yes, it's very likely. It'd be very convenient if we could send a patch to > buildbots for testing without actually submitting. > > > On Thu, Jul 18, 2013 at 4:38 PM, Eli Friedman > wrote: >> >> On Thu, Jul 18, 2013 at 4:29 PM, Rui Ueyama wrote: >> > Hi LLVMdev, >> > >> > My recent commit r186623 caused buildbots for s390-linux and ppc64-linux >> > to >> > fail. I rolled back that commit, and I'm trying to fix to re-submit. >> > >> > Is there any good way to debug the issue which does not occur on my >> > local >> > x86-64 machine? I don't have S390 nor PPC64 machines. >> >> If it's failing on S390 and PPC64 in particular, your code probably >> accidentally depends on the host endianness. (Looking at the commit >> in question seems to confirm that.) >> >> -Eli > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From chandlerc at google.com Thu Jul 18 17:43:46 2013 From: chandlerc at google.com (Chandler Carruth) Date: Thu, 18 Jul 2013 17:43:46 -0700 Subject: [LLVMdev] Debugging buildbot failure In-Reply-To: References: Message-ID: Chromium has long enjoyed the concept of 'try-bots' in addition to build bots. Thus far, the LLVM community hasn't needed them badly enough for anyone to go through the pain of setting them up. Once you think you have the fix, just recommit and watch the bots. On Thu, Jul 18, 2013 at 4:51 PM, Rui Ueyama wrote: > Yes, it's very likely. It'd be very convenient if we could send a patch to > buildbots for testing without actually submitting. > > > On Thu, Jul 18, 2013 at 4:38 PM, Eli Friedman wrote: > >> On Thu, Jul 18, 2013 at 4:29 PM, Rui Ueyama wrote: >> > Hi LLVMdev, >> > >> > My recent commit r186623 caused buildbots for s390-linux and >> ppc64-linux to >> > fail. I rolled back that commit, and I'm trying to fix to re-submit. >> > >> > Is there any good way to debug the issue which does not occur on my >> local >> > x86-64 machine? I don't have S390 nor PPC64 machines. >> >> If it's failing on S390 and PPC64 in particular, your code probably >> accidentally depends on the host endianness. (Looking at the commit >> in question seems to confirm that.) >> >> -Eli >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brianherman at gmail.com Thu Jul 18 17:48:26 2013 From: brianherman at gmail.com (Brian Herman) Date: Thu, 18 Jul 2013 19:48:26 -0500 Subject: [LLVMdev] issues for mac os building llvm? In-Reply-To: References: <51E87A74.9040308@mips.com> <51E87F6D.70900@mips.com> Message-ID: Yea a lot of iphone developers would be angry. LOL. On Thu, Jul 18, 2013 at 6:58 PM, Stephen Lin wrote: > > I just bought an 11" mac air for when I'm travelling or somewhere else > that > > is not at work and wanted to be able to do llvm work on it. > > > > I got the the 8 gig version but with a 128 gig flash drive because it > seemed > > to be enough but if I had to create another partition for llvm, then > maybe I > > should get the 256 gig version. > > > > It's not really a work computer but I wanted to be able to do some > things on > > it. > > I think a lot of people that work on llvm would be in trouble if it > didn't build on OS X :) > > -Stephen > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Thu Jul 18 17:56:07 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 18 Jul 2013 17:56:07 -0700 Subject: [LLVMdev] Request to review patch for bug #14792 In-Reply-To: References: Message-ID: On Thu, Jul 18, 2013 at 8:36 AM, Yun-Wei Lee wrote: > http://llvm.org/bugs/show_bug.cgi?id=14792 > > Problem: > In the i386 ABI Page 3-10, it said that the stack is aligned. However, the > two example code show that does not handle the alignment correctly when > using variadic function. For example, if the size of the first argument is > 17, the overflow_arg_area in va_list will be set to "address of first > argument + 16" instead of "address of first argument + 24" after calling > va_start. > In addition, #6636 showed the same problem because in AMD64, arguments is > passed by register at first, then pass by memory when run out of register > (AMD64 ABI 3.5.7 rule 10). > > Why this problem happened? > When calling va_start to set va_list, overflow_arg_area is not set > correctly. To set the overflow_arg_area correctly, we need to get the > FrameIndex correctly. Now, here comes the problem, llvm doesn't handle it > correctly. It accounts for StackSize to compute the FrameIndex, and if the > StackSize is not aligned, it will compute the wrong FrameIndex. As a result > overflow_arg_area will not be set correctly. > > My Solution: > 1. Record the Align if it is located in Memory. > 2. If it is variadic function and needs to set FrameIndex, adjust the > stacksize. Please read http://llvm.org/docs/DeveloperPolicy.html . In particular, patches should be sent to llvm-commits, and patches should generally include a regression test. In terms of the code, you might want to consider using llvm::RoundUpToAlignment. -Eli From peter at pcc.me.uk Thu Jul 18 18:06:36 2013 From: peter at pcc.me.uk (Peter Collingbourne) Date: Thu, 18 Jul 2013 18:06:36 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> <20130718191345.GA17545@pcc.me.uk> Message-ID: <20130719010636.GA4527@pcc.me.uk> On Thu, Jul 18, 2013 at 02:59:42PM -0700, Sean Silva wrote: > So far you only seem to have presented the GHC ABI use case (and see just > below about UBSan). Do you know of any other use cases? Not concretely. I was referring to other language implementations which might use runtime function metadata. > > we might as well use it in UBSan as > > opposed to something which is going to be strictly slower. > > > > Below you use UBSan's use case as a motivating example, which seems > incongruous to this "might as well use it in UBSan" attitude. Without > having evaluated alternatives as being "too slow" for UBSan, I don't think > that UBSan's use case should be used to drive this proposal. OK. So let's approach this from the GHC/runtime-function-metadata-based-language standpoint. I would argue that the client still ought to have some control over where the data appears relative to the function. This might be for the sake of conformance with an existing ABI for that language. For example, in the existing GHC tables-next-to-code ABI, the data appears right before the function. Given the ability to do this, is it too much of a stretch for the client to be able to specify where the symbol should be located? This is something that will be needed anyway, as explained in my symbol offset proposal. And provided that the client behaves, it shouldn't impose any additional burden on LLVM itself. > I don't have an issue with target-dependent things per se; I just think > that they should be given a bit more thought and not added unless existing > mechanisms are insufficient. For example, could this be implemented as a > late IR pass that adds a piece of inline-asm to the beginning of the > function? I don't like it, for four reasons: 1) Inline asm is just as target dependent as prefix data (perhaps even more so, if you consider that different targets may have different flavours of asm). 2) It takes control of the specific encoding of the instructions out of your hands, which can be important if you use it as a signature. 3) It inhibits optimisation, as it becomes more difficult to optimise away loads through a known function pointer. 4) The backend will probably need to be taught to treat this particular piece of asm specially, i.e. by not emitting a function prelude until it is emitted. By contrast, the backend can be taught to emit prefix data trivially with two lines of code. Thanks, -- Peter From eli.friedman at gmail.com Thu Jul 18 18:14:56 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 18 Jul 2013 18:14:56 -0700 Subject: [LLVMdev] clang searching for many linux directories that do not exist on FreeBSD host In-Reply-To: <51E82E36.7020505@pix.net> References: <51E82E36.7020505@pix.net> Message-ID: On Thu, Jul 18, 2013 at 11:04 AM, Kurt Lidl wrote: > > Greetings - > > I'm a user of clang (3.3), as it is the system compiler for my > installation of FreeBSD. (In FreeBSD 10, it will be the default > compiler, but that's not my point.) My system identifies itself > as: > > FreeBSD 9.2-PRERELEASE #0: Tue Jul 16 13:00:08 EDT 2013 > lidl at nine0:/usr/obj/usr/src/sys/GENERIC > > Recently, in preparation for the upcoming 9.2 release, they > imported the llvm 3.3 tree. That works fine for me. > > I did notice (while looking at an unrelated problem), that > clan looks around for a bunch of linux directories every time it > is started. And those directories are never going to be found, > at least not on a FreeBSD system. > > It was suggested that I take the issue here, rather than attempting > to fix it with a locally maintained FreeBSD patch. > > I've included my trivial test program, and the ktrace output. > I trimmed output to be the legitimate accesses to shared libraries, > etc that clang must make - leaving most of the extraneous accesses. > > It was easy to trace these patterns back to the file: > tools/clang/lib/Driver/ToolChains.cpp > > So my question is this: Is there any easy modification to make that > will allow clang to skip doing all this work for no gain? It seems > silly to me, when its being used as the system compiler, to have it > call stat() a little over two hundred times, each time > the compiler is started up. It's straightforward: you just need to make toolchains::FreeBSD inherit directly from ToolChain and implement all the methods it would otherwise inherit from Generic_ELF (which in turn inherits from Generic_GCC). -Eli From bryan.yang at thomsonreuters.com Thu Jul 18 19:57:41 2013 From: bryan.yang at thomsonreuters.com (bryan.yang at thomsonreuters.com) Date: Fri, 19 Jul 2013 02:57:41 +0000 Subject: [LLVMdev] [llvm 3.1] About the symbol count constraint in one obj file Message-ID: Hi, When using LLVM 3.1 to generate bitcode (then to obj file) on Windows, if there are a lot of symbols in one function (lots of local vars), then many of them are missing from the obj file. It seems that there is a symbol count constraint. Is there such a constraint? If so, what is the max number? And is there any change in LLVM 3.3? Thanks. Regards, Bryan This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters. -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Thu Jul 18 21:09:13 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 14:09:13 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> Message-ID: <51E8BBE9.8020407@uformia.com> I've attached the module->dump() that our code is producing. Unfortunately this is the smallest test case I have available. This is before any optimization passes are applied. There are two separate modules in existence at the time, and there are no guarantees about the order the surrounding code calls those functions, so there may be some interaction between them? There shouldn't be, they don't refer to any common memory etc. There is no multi-threading occurring. The function in module-dump.ll (called crashfunc in this file) is called with - func_params 0x0018f3b0 double [3] [0x0] -11.339976634695301 double [0x1] -9.7504239056205506 double [0x2] -5.2900856817382804 double at the time of the exception. This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic functions referred to in these modules are the standard equivalents from the MSVC library (e.g. @asin is the standard C lib double asin( double ) ). Hopefully this is reproducible for you. -- PeterN On 18/07/2013 4:37 PM, Craig Topper wrote: > Are you able to send any IR for others to reproduce this issue? > > > On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman > wrote: > > Unfortunately, this doesn't appear to be the bug I'm hitting. I > applied the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other > SIMD instructions. The common factor is in each case, ECX is set > to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . > > Additionally, turning the optimization level passed to createJIT > down appears to avoid it, so I'm now leaning towards a bug in one > of the optimization passes. > > I'm going to dig through the passes controlled by that parameter > and see if I can narrow down which optimization is causing it. > > Peter N > > > On 17/07/2013 1:58 PM, Solomon Boulos wrote: > > As someone off list just told me, perhaps my new bug is the > same issue: > > http://llvm.org/bugs/show_bug.cgi?id=16640 > > Do you happen to be using FastISel? > > Solomon > > On Jul 16, 2013, at 6:39 PM, Peter Newman > wrote: > > Hello all, > > I'm currently in the process of debugging a crash > occurring in our program. In LLVM 3.2 and 3.3 it appears > that JIT generated code is attempting to perform access > unaligned memory with a SSE2 instruction. However this > only happens under certain conditions that seem (but may > not be) related to the stacks state on calling the function. > > Our program acts as a front-end, using the LLVM C++ API to > generate a JIT generated function. This function is > primarily mathematical, so we use the Vector types to take > advantage of SIMD instructions (as well as a few SSE2 > intrinsics). > > This worked in LLVM 2.8 but started failing in 3.2 and has > continued to fail in 3.3. It fails with no optimizations > applied to the LLVM Function/Module. It crashes with what > is reported as a memory access error (accessing > 0xffffffff), however it's suggested that this is how the > SSE fault raising mechanism appears. > > The generated instruction varies, but it seems to often be > similar to (I don't have it in front of me, sorry): > movapd xmm0, xmm[ecx+0x???????] > Where the xmm register changes, and the second parameter > is a memory access. > ECX is always set to 0x7ffffff - however I don't know if > this is part of the SSE error reporting process or is part > of the situation causing the error. > > I haven't worked out exactly what code path etc is causing > this crash. I'm hoping that someone can tell me if there > were any changed requirements for working with SIMD in > LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I > currently suspect the use of GlobalVariable (we first > discovered the crash when using a feature that uses them), > however I have attempted using setAlignment on the > GlobalVariables without any change. > > -- > Peter N > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ; ModuleID = 'crashmodule' @"460" = private constant [12 x <2 x double>] [<2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer] @"461" = private constant [12 x <2 x double>] [<2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer] @"462" = private constant [24 x <2 x double>] [<2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> , <2 x double> zeroinitializer] define double @crashfunc(double* %params) { body: %0 = alloca <2 x double> %1 = alloca <4 x double> %2 = alloca { <2 x double>, <2 x double>, <2 x double> } %3 = load { <2 x double>, <2 x double>, <2 x double> }* %2 %4 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %3, 0 %5 = getelementptr double* %params, i32 0 %6 = load double* %5 %7 = insertelement <2 x double> %4, double %6, i32 0 %8 = insertelement <2 x double> %7, double %6, i32 1 %9 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %3, <2 x double> %8, 0 %10 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %9, 1 %11 = getelementptr double* %params, i32 1 %12 = load double* %11 %13 = insertelement <2 x double> %10, double %12, i32 0 %14 = insertelement <2 x double> %13, double %12, i32 1 %15 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %9, <2 x double> %14, 1 %16 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %15, 2 %17 = getelementptr double* %params, i32 2 %18 = load double* %17 %19 = insertelement <2 x double> %16, double %18, i32 0 %20 = insertelement <2 x double> %19, double %18, i32 1 %21 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %15, <2 x double> %20, 2 store <4 x double> zeroinitializer, <4 x double>* %1 store <2 x double> zeroinitializer, <2 x double>* %0 br label %array_loop array_loop: ; preds = %array_loop_tail, %body %22 = load <4 x double>* %1 %23 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 0 %24 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 1 %25 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 2 %26 = extractelement <4 x double> %22, i32 0 %27 = insertelement <2 x double> zeroinitializer, double %26, i32 0 %28 = insertelement <2 x double> %27, double %26, i32 1 %29 = fmul <2 x double> %28, %30 = fsub <2 x double> %23, %29 %31 = fmul <2 x double> %28, zeroinitializer %32 = fsub <2 x double> %24, %31 %33 = fmul <2 x double> %28, zeroinitializer %34 = fsub <2 x double> %25, %33 %35 = extractelement <4 x double> %22, i32 1 %36 = insertelement <2 x double> zeroinitializer, double %35, i32 0 %37 = insertelement <2 x double> %36, double %35, i32 1 %38 = fmul <2 x double> %37, zeroinitializer %39 = fsub <2 x double> %30, %38 %40 = fmul <2 x double> %37, %41 = fsub <2 x double> %32, %40 %42 = fmul <2 x double> %37, zeroinitializer %43 = fsub <2 x double> %34, %42 %44 = extractelement <4 x double> %22, i32 2 %45 = insertelement <2 x double> zeroinitializer, double %44, i32 0 %46 = insertelement <2 x double> %45, double %44, i32 1 %47 = fmul <2 x double> %46, zeroinitializer %48 = fsub <2 x double> %39, %47 %49 = fmul <2 x double> %46, zeroinitializer %50 = fsub <2 x double> %41, %49 %51 = fmul <2 x double> %46, %52 = fsub <2 x double> %43, %51 %53 = extractelement <4 x double> %22, i32 0 %54 = fptoui double %53 to i32 %55 = mul i32 %54, 12 %56 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %55 %57 = load <2 x double>* %56 %58 = add i32 %55, 1 %59 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %58 %60 = load <2 x double>* %59 %61 = add i32 %58, 1 %62 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %61 %63 = load <2 x double>* %62 %64 = add i32 %61, 1 %65 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %64 %66 = load <2 x double>* %65 %67 = add i32 %64, 1 %68 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %67 %69 = load <2 x double>* %68 %70 = add i32 %67, 1 %71 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %70 %72 = load <2 x double>* %71 %73 = add i32 %70, 1 %74 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %73 %75 = load <2 x double>* %74 %76 = add i32 %73, 1 %77 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %76 %78 = load <2 x double>* %77 %79 = add i32 %76, 1 %80 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %79 %81 = load <2 x double>* %80 %82 = add i32 %79, 1 %83 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %82 %84 = load <2 x double>* %83 %85 = add i32 %82, 1 %86 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %85 %87 = load <2 x double>* %86 %88 = add i32 %85, 1 %89 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %88 %90 = load <2 x double>* %89 %91 = fmul <2 x double> %52, %63 %92 = fmul <2 x double> %50, %60 %93 = fmul <2 x double> %48, %57 %94 = fadd <2 x double> %93, %92 %95 = fadd <2 x double> %94, %91 %96 = fadd <2 x double> %95, %66 %97 = fmul <2 x double> %52, %75 %98 = fmul <2 x double> %50, %72 %99 = fmul <2 x double> %48, %69 %100 = fadd <2 x double> %99, %98 %101 = fadd <2 x double> %100, %97 %102 = fadd <2 x double> %101, %78 %103 = fmul <2 x double> %52, %87 %104 = fmul <2 x double> %50, %84 %105 = fmul <2 x double> %48, %81 %106 = fadd <2 x double> %105, %104 %107 = fadd <2 x double> %106, %103 %108 = fadd <2 x double> %107, %90 %109 = extractelement <4 x double> %22, i32 1 %110 = fptoui double %109 to i32 %111 = mul i32 %110, 12 %112 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %111 %113 = load <2 x double>* %112 %114 = add i32 %111, 1 %115 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %114 %116 = load <2 x double>* %115 %117 = add i32 %114, 1 %118 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %117 %119 = load <2 x double>* %118 %120 = add i32 %117, 1 %121 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %120 %122 = load <2 x double>* %121 %123 = add i32 %120, 1 %124 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %123 %125 = load <2 x double>* %124 %126 = add i32 %123, 1 %127 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %126 %128 = load <2 x double>* %127 %129 = add i32 %126, 1 %130 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %129 %131 = load <2 x double>* %130 %132 = add i32 %129, 1 %133 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %132 %134 = load <2 x double>* %133 %135 = add i32 %132, 1 %136 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %135 %137 = load <2 x double>* %136 %138 = add i32 %135, 1 %139 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %138 %140 = load <2 x double>* %139 %141 = add i32 %138, 1 %142 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %141 %143 = load <2 x double>* %142 %144 = add i32 %141, 1 %145 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %144 %146 = load <2 x double>* %145 %147 = fmul <2 x double> %108, %119 %148 = fmul <2 x double> %102, %116 %149 = fmul <2 x double> %96, %113 %150 = fadd <2 x double> %149, %148 %151 = fadd <2 x double> %150, %147 %152 = fadd <2 x double> %151, %122 %153 = fmul <2 x double> %108, %131 %154 = fmul <2 x double> %102, %128 %155 = fmul <2 x double> %96, %125 %156 = fadd <2 x double> %155, %154 %157 = fadd <2 x double> %156, %153 %158 = fadd <2 x double> %157, %134 %159 = fmul <2 x double> %108, %143 %160 = fmul <2 x double> %102, %140 %161 = fmul <2 x double> %96, %137 %162 = fadd <2 x double> %161, %160 %163 = fadd <2 x double> %162, %159 %164 = fadd <2 x double> %163, %146 %165 = extractelement <4 x double> %22, i32 2 %166 = fptoui double %165 to i32 %167 = mul i32 %166, 12 %168 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %167 %169 = load <2 x double>* %168 %170 = add i32 %167, 1 %171 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %170 %172 = load <2 x double>* %171 %173 = add i32 %170, 1 %174 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %173 %175 = load <2 x double>* %174 %176 = add i32 %173, 1 %177 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %176 %178 = load <2 x double>* %177 %179 = add i32 %176, 1 %180 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %179 %181 = load <2 x double>* %180 %182 = add i32 %179, 1 %183 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %182 %184 = load <2 x double>* %183 %185 = add i32 %182, 1 %186 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %185 %187 = load <2 x double>* %186 %188 = add i32 %185, 1 %189 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %188 %190 = load <2 x double>* %189 %191 = add i32 %188, 1 %192 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %191 %193 = load <2 x double>* %192 %194 = add i32 %191, 1 %195 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %194 %196 = load <2 x double>* %195 %197 = add i32 %194, 1 %198 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %197 %199 = load <2 x double>* %198 %200 = add i32 %197, 1 %201 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %200 %202 = load <2 x double>* %201 %203 = fmul <2 x double> %164, %175 %204 = fmul <2 x double> %158, %172 %205 = fmul <2 x double> %152, %169 %206 = fadd <2 x double> %205, %204 %207 = fadd <2 x double> %206, %203 %208 = fadd <2 x double> %207, %178 %209 = fmul <2 x double> %164, %187 %210 = fmul <2 x double> %158, %184 %211 = fmul <2 x double> %152, %181 %212 = fadd <2 x double> %211, %210 %213 = fadd <2 x double> %212, %209 %214 = fadd <2 x double> %213, %190 %215 = fmul <2 x double> %164, %199 %216 = fmul <2 x double> %158, %196 %217 = fmul <2 x double> %152, %193 %218 = fadd <2 x double> %217, %216 %219 = fadd <2 x double> %218, %215 %220 = fadd <2 x double> %219, %202 %221 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %21, <2 x double> %208, 0 %222 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %221, <2 x double> %214, 1 %223 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %222, <2 x double> %220, 2 %224 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 0 %225 = fsub <2 x double> %224, zeroinitializer %226 = fmul <2 x double> %225, %227 = fmul <2 x double> %226, %226 %228 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 1 %229 = fsub <2 x double> %228, zeroinitializer %230 = fmul <2 x double> %229, %231 = fmul <2 x double> %230, %230 %232 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 2 %233 = fsub <2 x double> %232, zeroinitializer %234 = fmul <2 x double> %233, %235 = fmul <2 x double> %234, %234 %236 = fadd <2 x double> %227, %231 %237 = fadd <2 x double> %236, %235 %238 = fsub <2 x double> , %237 %239 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 0 %240 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 1 %241 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 2 %242 = fsub <2 x double> %239, %243 = fsub <2 x double> %240, %244 = fsub <2 x double> %241, zeroinitializer %245 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %223, <2 x double> %242, 0 %246 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %245, <2 x double> %243, 1 %247 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %246, <2 x double> %244, 2 %248 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 0 %249 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 1 %250 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 2 %251 = fsub <2 x double> %248, zeroinitializer %252 = fsub <2 x double> %249, zeroinitializer %253 = fsub <2 x double> %250, zeroinitializer %254 = fmul <2 x double> %251, %255 = fmul <2 x double> %253, zeroinitializer %256 = fsub <2 x double> %254, %255 %257 = fmul <2 x double> %251, zeroinitializer %258 = fmul <2 x double> %253, %259 = fadd <2 x double> %257, %258 %260 = fmul <2 x double> %256, %261 = fmul <2 x double> %252, %262 = fadd <2 x double> %260, %261 %263 = fmul <2 x double> %256, %264 = fmul <2 x double> %252, %265 = fsub <2 x double> %264, %263 %266 = fmul <2 x double> %265, %267 = fmul <2 x double> %259, zeroinitializer %268 = fadd <2 x double> %266, %267 %269 = fmul <2 x double> %265, zeroinitializer %270 = fmul <2 x double> %259, %271 = fsub <2 x double> %270, %269 %272 = fadd <2 x double> %262, zeroinitializer %273 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %247, <2 x double> %272, 0 %274 = fadd <2 x double> %268, zeroinitializer %275 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %273, <2 x double> %274, 1 %276 = fadd <2 x double> %271, zeroinitializer %277 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %275, <2 x double> %276, 2 %278 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 0 %279 = fsub <2 x double> %278, zeroinitializer %280 = fmul <2 x double> %279, %281 = fmul <2 x double> %280, %280 %282 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 1 %283 = fsub <2 x double> %282, zeroinitializer %284 = fmul <2 x double> %283, %285 = fmul <2 x double> %284, %284 %286 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 2 %287 = fsub <2 x double> %286, zeroinitializer %288 = fmul <2 x double> %287, %289 = fmul <2 x double> %288, %288 %290 = fadd <2 x double> %281, %285 %291 = fadd <2 x double> %290, %289 %292 = fsub <2 x double> , %291 %293 = fmul <2 x double> %238, %294 = fmul <2 x double> %293, %293 %295 = fmul <2 x double> %292, %296 = fmul <2 x double> %295, %295 %297 = fadd <2 x double> %294, %296 %298 = fadd <2 x double> %297, %299 = extractelement <2 x double> %298, i32 0 %300 = fdiv double 1.000000e+00, %299 %301 = insertelement <2 x double> %298, double %300, i32 0 %302 = extractelement <2 x double> %301, i32 1 %303 = fdiv double 1.000000e+00, %302 %304 = insertelement <2 x double> %301, double %303, i32 1 %305 = fmul <2 x double> %304, %306 = fmul <2 x double> %292, %292 %307 = fmul <2 x double> %238, %238 %308 = fadd <2 x double> %307, %306 %309 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %308) %310 = fadd <2 x double> %238, %292 %311 = fadd <2 x double> %310, %309 %312 = fadd <2 x double> %311, %305 %313 = load <4 x double>* %1 %314 = extractelement <4 x double> %313, i32 0 %315 = extractelement <4 x double> %313, i32 1 %316 = fadd double %314, %315 %317 = extractelement <4 x double> %313, i32 2 %318 = fadd double %316, %317 %319 = fcmp oeq double %318, 0.000000e+00 %320 = load <2 x double>* %0 %321 = fmul <2 x double> %312, %322 = fmul <2 x double> %321, %321 %323 = fmul <2 x double> %320, %320 %324 = fadd <2 x double> %323, %322 %325 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %324) %326 = fadd <2 x double> %320, %321 %327 = fadd <2 x double> %326, %325 %328 = select i1 %319, <2 x double> %312, <2 x double> %327 store <2 x double> %328, <2 x double>* %0 br label %array_loop_tail array_loop_tail: ; preds = %array_loop %329 = extractelement <4 x double> %313, i32 0 %330 = fadd double %329, 1.000000e+00 %331 = fcmp oge double %330, 1.000000e+00 %332 = select i1 %331, double 0.000000e+00, double %330 %333 = insertelement <4 x double> %313, double %332, i32 0 %334 = extractelement <4 x double> %333, i32 1 %335 = fadd double %334, 1.000000e+00 %336 = select i1 %331, double %335, double %334 %337 = fcmp oge double %336, 1.000000e+00 %338 = select i1 %337, double 0.000000e+00, double %336 %339 = insertelement <4 x double> %333, double %338, i32 1 %340 = extractelement <4 x double> %339, i32 2 %341 = fadd double %340, 1.000000e+00 %342 = select i1 %337, double %341, double %340 %343 = fcmp oge double %342, 2.000000e+00 %344 = select i1 %343, double 0.000000e+00, double %342 %345 = insertelement <4 x double> %339, double %344, i32 2 store <4 x double> %345, <4 x double>* %1 br i1 %343, label %array_loop_end, label %array_loop array_loop_end: ; preds = %array_loop_tail %346 = load <2 x double>* %0 %347 = extractelement <2 x double> %346, i32 0 ret double %347 } ; Function Attrs: nounwind readonly declare double @llvm.sin.f64(double) #0 ; Function Attrs: nounwind readonly declare double @llvm.cos.f64(double) #0 ; Function Attrs: nounwind readnone declare double @asin(double) #1 ; Function Attrs: nounwind readnone declare double @acos(double) #1 ; Function Attrs: nounwind readnone declare double @atan(double) #1 ; Function Attrs: nounwind readnone declare double @flr(double) #1 ; Function Attrs: nounwind readonly declare double @llvm.exp.f64(double) #0 ; Function Attrs: nounwind readonly declare double @llvm.log.f64(double) #0 ; Function Attrs: nounwind readnone declare void @dump(double) #1 ; Function Attrs: nounwind readonly declare double @llvm.pow.f64(double, double) #0 ; Function Attrs: nounwind readnone declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) #1 attributes #0 = { nounwind readonly } attributes #1 = { nounwind readnone } -------------- next part -------------- A non-text attachment was scrubbed... Name: module-dump-2.zip Type: application/octet-stream Size: 23030 bytes Desc: not available URL: From baldrick at free.fr Thu Jul 18 21:44:49 2013 From: baldrick at free.fr (Duncan Sands) Date: Fri, 19 Jul 2013 06:44:49 +0200 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: <51E8C441.5070106@free.fr> Hi, > What if the prefix data was stored before the start of the function > code? The function's symbol will point to the code just as before, > eliminating the need to have instructions that skip the prefix data. how many platforms would this work on? Last time I tried something analogous to this it fell through because the Darwin object code format didn't support it. I'm not saying that what you are suggesting wouldn't work on Darwin - I don't know. But given my past experience it would be wise to do a platform survey before going too far. Ciao, Duncan. From peter at uformia.com Thu Jul 18 22:12:58 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 15:12:58 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E8BBE9.8020407@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> Message-ID: <51E8CADA.1040506@uformia.com> After stepping through the produced assembly, I believe I have a culprit. One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX - while the produced code is expecting it to still contain its previous value. Peter N On 19/07/2013 2:09 PM, Peter Newman wrote: > I've attached the module->dump() that our code is producing. > Unfortunately this is the smallest test case I have available. > > This is before any optimization passes are applied. There are two > separate modules in existence at the time, and there are no guarantees > about the order the surrounding code calls those functions, so there > may be some interaction between them? There shouldn't be, they don't > refer to any common memory etc. There is no multi-threading occurring. > > The function in module-dump.ll (called crashfunc in this file) is > called with > - func_params 0x0018f3b0 double [3] > [0x0] -11.339976634695301 double > [0x1] -9.7504239056205506 double > [0x2] -5.2900856817382804 double > at the time of the exception. > > This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic > functions referred to in these modules are the standard equivalents > from the MSVC library (e.g. @asin is the standard C lib double > asin( double ) ). > > Hopefully this is reproducible for you. > > -- > PeterN > > On 18/07/2013 4:37 PM, Craig Topper wrote: >> Are you able to send any IR for others to reproduce this issue? >> >> >> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman > > wrote: >> >> Unfortunately, this doesn't appear to be the bug I'm hitting. I >> applied the fix to my source and it didn't make a difference. >> >> Also further testing found me getting the same behavior with >> other SIMD instructions. The common factor is in each case, ECX >> is set to 0x7fffffff, and it's an operation using xmm ptr >> ecx+offset . >> >> Additionally, turning the optimization level passed to createJIT >> down appears to avoid it, so I'm now leaning towards a bug in one >> of the optimization passes. >> >> I'm going to dig through the passes controlled by that parameter >> and see if I can narrow down which optimization is causing it. >> >> Peter N >> >> >> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >> >> As someone off list just told me, perhaps my new bug is the >> same issue: >> >> http://llvm.org/bugs/show_bug.cgi?id=16640 >> >> Do you happen to be using FastISel? >> >> Solomon >> >> On Jul 16, 2013, at 6:39 PM, Peter Newman > > wrote: >> >> Hello all, >> >> I'm currently in the process of debugging a crash >> occurring in our program. In LLVM 3.2 and 3.3 it appears >> that JIT generated code is attempting to perform access >> unaligned memory with a SSE2 instruction. However this >> only happens under certain conditions that seem (but may >> not be) related to the stacks state on calling the function. >> >> Our program acts as a front-end, using the LLVM C++ API >> to generate a JIT generated function. This function is >> primarily mathematical, so we use the Vector types to >> take advantage of SIMD instructions (as well as a few >> SSE2 intrinsics). >> >> This worked in LLVM 2.8 but started failing in 3.2 and >> has continued to fail in 3.3. It fails with no >> optimizations applied to the LLVM Function/Module. It >> crashes with what is reported as a memory access error >> (accessing 0xffffffff), however it's suggested that this >> is how the SSE fault raising mechanism appears. >> >> The generated instruction varies, but it seems to often >> be similar to (I don't have it in front of me, sorry): >> movapd xmm0, xmm[ecx+0x???????] >> Where the xmm register changes, and the second parameter >> is a memory access. >> ECX is always set to 0x7ffffff - however I don't know if >> this is part of the SSE error reporting process or is >> part of the situation causing the error. >> >> I haven't worked out exactly what code path etc is >> causing this crash. I'm hoping that someone can tell me >> if there were any changed requirements for working with >> SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >> 3.1). I currently suspect the use of GlobalVariable (we >> first discovered the crash when using a feature that uses >> them), however I have attempted using setAlignment on the >> GlobalVariables without any change. >> >> -- >> Peter N >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> -- >> ~Craig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Thu Jul 18 22:24:31 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Fri, 19 Jul 2013 07:24:31 +0200 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <37D17E8A-300B-40EC-B96F-B12044C28D3A@grame.fr> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> <37D17E8A-300B-40EC-B96F-B12044C28D3A@grame.fr> Message-ID: <41E2E7A4-4508-4014-9FE7-54FD7320BC73@grame.fr> Le 18 juil. 2013 à 23:51, Stéphane Letz a écrit : > > Le 18 juil. 2013 à 21:05, "Kaylor, Andrew" a écrit : > >> I understand you to mean that you have isolated the actual execution time as your point of comparison, as opposed to including runtime loading and so on. Is this correct? > > We are testing actual execution time yes : time used in a given JIT compiled function. >> >> >> One thing that changed between 3.1 and 3.3 is that MCJIT no longer compiles the module during the engine creation process but instead waits until either a function pointer is requested or finalizeObject is called. I would guess that you have taken that into account in your measurement technique, but it seemed worth mentioning. > > OK, so I guess our testing is then correct since we are testing actual execution time of the function pointer. >> >> >> What architecture/OS are you testing? > > 64 bits OSX (10.8.4) >> >> With LLVM 3.3 you can register a JIT event listener (using ExecutionEngine::RegisterJITEventListener) that MCJIT will call with a copy of the actual object image that gets generated. You could then write that image to a file as a basis for comparing the generated code. You can find a reference implementation of the interface in lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp. > > Thanks I'll have a look. >> >> -Andy >> > > Stéphane > And since the 1) DSL ==> C/C++ ===> clang/gcc -03 ===> exec code chain has the "correct" speed, there is no reason the JIT based one should be slower right? So I still guess something is wrong in the way we use the JIT and/or some LTO issue possibly? Stéphane From craig.topper at gmail.com Thu Jul 18 22:25:59 2013 From: craig.topper at gmail.com (Craig Topper) Date: Thu, 18 Jul 2013 22:25:59 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E8CADA.1040506@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> Message-ID: What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed with "llvm.x86". On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: > After stepping through the produced assembly, I believe I have a culprit. > > One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX - > while the produced code is expecting it to still contain its previous value. > > Peter N > > > On 19/07/2013 2:09 PM, Peter Newman wrote: > > I've attached the module->dump() that our code is producing. Unfortunately > this is the smallest test case I have available. > > This is before any optimization passes are applied. There are two separate > modules in existence at the time, and there are no guarantees about the > order the surrounding code calls those functions, so there may be some > interaction between them? There shouldn't be, they don't refer to any > common memory etc. There is no multi-threading occurring. > > The function in module-dump.ll (called crashfunc in this file) is called > with > - func_params 0x0018f3b0 double [3] > [0x0] -11.339976634695301 double > [0x1] -9.7504239056205506 double > [0x2] -5.2900856817382804 double > at the time of the exception. > > This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic > functions referred to in these modules are the standard equivalents from > the MSVC library (e.g. @asin is the standard C lib double asin( double ) > ). > > Hopefully this is reproducible for you. > > -- > PeterN > > On 18/07/2013 4:37 PM, Craig Topper wrote: > > Are you able to send any IR for others to reproduce this issue? > > > On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: > >> Unfortunately, this doesn't appear to be the bug I'm hitting. I applied >> the fix to my source and it didn't make a difference. >> >> Also further testing found me getting the same behavior with other SIMD >> instructions. The common factor is in each case, ECX is set to 0x7fffffff, >> and it's an operation using xmm ptr ecx+offset . >> >> Additionally, turning the optimization level passed to createJIT down >> appears to avoid it, so I'm now leaning towards a bug in one of the >> optimization passes. >> >> I'm going to dig through the passes controlled by that parameter and see >> if I can narrow down which optimization is causing it. >> >> Peter N >> >> >> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >> >>> As someone off list just told me, perhaps my new bug is the same issue: >>> >>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>> >>> Do you happen to be using FastISel? >>> >>> Solomon >>> >>> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >>> >>> Hello all, >>>> >>>> I'm currently in the process of debugging a crash occurring in our >>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>> However this only happens under certain conditions that seem (but may not >>>> be) related to the stacks state on calling the function. >>>> >>>> Our program acts as a front-end, using the LLVM C++ API to generate a >>>> JIT generated function. This function is primarily mathematical, so we use >>>> the Vector types to take advantage of SIMD instructions (as well as a few >>>> SSE2 intrinsics). >>>> >>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued to >>>> fail in 3.3. It fails with no optimizations applied to the LLVM >>>> Function/Module. It crashes with what is reported as a memory access error >>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>> fault raising mechanism appears. >>>> >>>> The generated instruction varies, but it seems to often be similar to >>>> (I don't have it in front of me, sorry): >>>> movapd xmm0, xmm[ecx+0x???????] >>>> Where the xmm register changes, and the second parameter is a memory >>>> access. >>>> ECX is always set to 0x7ffffff - however I don't know if this is part >>>> of the SSE error reporting process or is part of the situation causing the >>>> error. >>>> >>>> I haven't worked out exactly what code path etc is causing this crash. >>>> I'm hoping that someone can tell me if there were any changed requirements >>>> for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >>>> 3.1). I currently suspect the use of GlobalVariable (we first discovered >>>> the crash when using a feature that uses them), however I have attempted >>>> using setAlignment on the GlobalVariables without any change. >>>> >>>> -- >>>> Peter N >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > > -- > ~Craig > > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Thu Jul 18 22:27:28 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 15:27:28 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> Message-ID: <51E8CE40.3050703@uformia.com> Sorry, that should have been llvm.x86.sse2.sqrt.pd On 19/07/2013 3:25 PM, Craig Topper wrote: > What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things > prefixed with "llvm.x86". > > > On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman > wrote: > > After stepping through the produced assembly, I believe I have a > culprit. > > One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value > of ECX - while the produced code is expecting it to still contain > its previous value. > > Peter N > > > On 19/07/2013 2:09 PM, Peter Newman wrote: >> I've attached the module->dump() that our code is producing. >> Unfortunately this is the smallest test case I have available. >> >> This is before any optimization passes are applied. There are two >> separate modules in existence at the time, and there are no >> guarantees about the order the surrounding code calls those >> functions, so there may be some interaction between them? There >> shouldn't be, they don't refer to any common memory etc. There is >> no multi-threading occurring. >> >> The function in module-dump.ll (called crashfunc in this file) is >> called with >> - func_params 0x0018f3b0 double [3] >> [0x0] -11.339976634695301 double >> [0x1] -9.7504239056205506 double >> [0x2] -5.2900856817382804 double >> at the time of the exception. >> >> This is compiled on a "i686-pc-win32" triple. All of the >> non-intrinsic functions referred to in these modules are the >> standard equivalents from the MSVC library (e.g. @asin is the >> standard C lib double asin( double ) ). >> >> Hopefully this is reproducible for you. >> >> -- >> PeterN >> >> On 18/07/2013 4:37 PM, Craig Topper wrote: >>> Are you able to send any IR for others to reproduce this issue? >>> >>> >>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>> > wrote: >>> >>> Unfortunately, this doesn't appear to be the bug I'm >>> hitting. I applied the fix to my source and it didn't make a >>> difference. >>> >>> Also further testing found me getting the same behavior with >>> other SIMD instructions. The common factor is in each case, >>> ECX is set to 0x7fffffff, and it's an operation using xmm >>> ptr ecx+offset . >>> >>> Additionally, turning the optimization level passed to >>> createJIT down appears to avoid it, so I'm now leaning >>> towards a bug in one of the optimization passes. >>> >>> I'm going to dig through the passes controlled by that >>> parameter and see if I can narrow down which optimization is >>> causing it. >>> >>> Peter N >>> >>> >>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>> >>> As someone off list just told me, perhaps my new bug is >>> the same issue: >>> >>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>> >>> Do you happen to be using FastISel? >>> >>> Solomon >>> >>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>> > wrote: >>> >>> Hello all, >>> >>> I'm currently in the process of debugging a crash >>> occurring in our program. In LLVM 3.2 and 3.3 it >>> appears that JIT generated code is attempting to >>> perform access unaligned memory with a SSE2 >>> instruction. However this only happens under certain >>> conditions that seem (but may not be) related to the >>> stacks state on calling the function. >>> >>> Our program acts as a front-end, using the LLVM C++ >>> API to generate a JIT generated function. This >>> function is primarily mathematical, so we use the >>> Vector types to take advantage of SIMD instructions >>> (as well as a few SSE2 intrinsics). >>> >>> This worked in LLVM 2.8 but started failing in 3.2 >>> and has continued to fail in 3.3. It fails with no >>> optimizations applied to the LLVM Function/Module. >>> It crashes with what is reported as a memory access >>> error (accessing 0xffffffff), however it's suggested >>> that this is how the SSE fault raising mechanism >>> appears. >>> >>> The generated instruction varies, but it seems to >>> often be similar to (I don't have it in front of me, >>> sorry): >>> movapd xmm0, xmm[ecx+0x???????] >>> Where the xmm register changes, and the second >>> parameter is a memory access. >>> ECX is always set to 0x7ffffff - however I don't >>> know if this is part of the SSE error reporting >>> process or is part of the situation causing the error. >>> >>> I haven't worked out exactly what code path etc is >>> causing this crash. I'm hoping that someone can tell >>> me if there were any changed requirements for >>> working with SIMD in LLVM 3.2 (or earlier, we >>> haven't tried 3.0 or 3.1). I currently suspect the >>> use of GlobalVariable (we first discovered the crash >>> when using a feature that uses them), however I have >>> attempted using setAlignment on the GlobalVariables >>> without any change. >>> >>> -- >>> Peter N >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> >>> -- >>> ~Craig >> > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Thu Jul 18 22:29:38 2013 From: craig.topper at gmail.com (Craig Topper) Date: Thu, 18 Jul 2013 22:29:38 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E8CE40.3050703@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> Message-ID: That should map directly to sqrtpd which can't modify ecx. On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: > Sorry, that should have been llvm.x86.sse2.sqrt.pd > > > On 19/07/2013 3:25 PM, Craig Topper wrote: > > What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed > with "llvm.x86". > > > On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: > >> After stepping through the produced assembly, I believe I have a >> culprit. >> >> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX >> - while the produced code is expecting it to still contain its previous >> value. >> >> Peter N >> >> >> On 19/07/2013 2:09 PM, Peter Newman wrote: >> >> I've attached the module->dump() that our code is producing. >> Unfortunately this is the smallest test case I have available. >> >> This is before any optimization passes are applied. There are two >> separate modules in existence at the time, and there are no guarantees >> about the order the surrounding code calls those functions, so there may be >> some interaction between them? There shouldn't be, they don't refer to any >> common memory etc. There is no multi-threading occurring. >> >> The function in module-dump.ll (called crashfunc in this file) is called >> with >> - func_params 0x0018f3b0 double [3] >> [0x0] -11.339976634695301 double >> [0x1] -9.7504239056205506 double >> [0x2] -5.2900856817382804 double >> at the time of the exception. >> >> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >> functions referred to in these modules are the standard equivalents from >> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >> ). >> >> Hopefully this is reproducible for you. >> >> -- >> PeterN >> >> On 18/07/2013 4:37 PM, Craig Topper wrote: >> >> Are you able to send any IR for others to reproduce this issue? >> >> >> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >> >>> Unfortunately, this doesn't appear to be the bug I'm hitting. I applied >>> the fix to my source and it didn't make a difference. >>> >>> Also further testing found me getting the same behavior with other SIMD >>> instructions. The common factor is in each case, ECX is set to 0x7fffffff, >>> and it's an operation using xmm ptr ecx+offset . >>> >>> Additionally, turning the optimization level passed to createJIT down >>> appears to avoid it, so I'm now leaning towards a bug in one of the >>> optimization passes. >>> >>> I'm going to dig through the passes controlled by that parameter and see >>> if I can narrow down which optimization is causing it. >>> >>> Peter N >>> >>> >>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>> >>>> As someone off list just told me, perhaps my new bug is the same issue: >>>> >>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>> >>>> Do you happen to be using FastISel? >>>> >>>> Solomon >>>> >>>> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >>>> >>>> Hello all, >>>>> >>>>> I'm currently in the process of debugging a crash occurring in our >>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>> However this only happens under certain conditions that seem (but may not >>>>> be) related to the stacks state on calling the function. >>>>> >>>>> Our program acts as a front-end, using the LLVM C++ API to generate a >>>>> JIT generated function. This function is primarily mathematical, so we use >>>>> the Vector types to take advantage of SIMD instructions (as well as a few >>>>> SSE2 intrinsics). >>>>> >>>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued >>>>> to fail in 3.3. It fails with no optimizations applied to the LLVM >>>>> Function/Module. It crashes with what is reported as a memory access error >>>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>>> fault raising mechanism appears. >>>>> >>>>> The generated instruction varies, but it seems to often be similar to >>>>> (I don't have it in front of me, sorry): >>>>> movapd xmm0, xmm[ecx+0x???????] >>>>> Where the xmm register changes, and the second parameter is a memory >>>>> access. >>>>> ECX is always set to 0x7ffffff - however I don't know if this is part >>>>> of the SSE error reporting process or is part of the situation causing the >>>>> error. >>>>> >>>>> I haven't worked out exactly what code path etc is causing this crash. >>>>> I'm hoping that someone can tell me if there were any changed requirements >>>>> for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >>>>> 3.1). I currently suspect the use of GlobalVariable (we first discovered >>>>> the crash when using a feature that uses them), however I have attempted >>>>> using setAlignment on the GlobalVariables without any change. >>>>> >>>>> -- >>>>> Peter N >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> >> -- >> ~Craig >> >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Thu Jul 18 22:45:15 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 15:45:15 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> Message-ID: <51E8D26B.3000108@uformia.com> In the disassembly, I'm seeing three cases of call 76719BA1 I am assuming this is the sqrt function as this is the only function called in the LLVM IR. The code at 76719BA1 is: 76719BA1 push ebp 76719BA2 mov ebp,esp 76719BA4 sub esp,20h 76719BA7 and esp,0FFFFFFF0h 76719BAA fld st(0) 76719BAC fst dword ptr [esp+18h] 76719BB0 fistp qword ptr [esp+10h] 76719BB4 fild qword ptr [esp+10h] 76719BB8 mov edx,dword ptr [esp+18h] 76719BBC mov eax,dword ptr [esp+10h] 76719BC0 test eax,eax 76719BC2 je 76719DCF 76719BC8 fsubp st(1),st 76719BCA test edx,edx 76719BCC js 7671F9DB 76719BD2 fstp dword ptr [esp] 76719BD5 mov ecx,dword ptr [esp] 76719BD8 add ecx,7FFFFFFFh 76719BDE sbb eax,0 76719BE1 mov edx,dword ptr [esp+14h] 76719BE5 sbb edx,0 76719BE8 leave 76719BE9 ret As you can see at 76719BD5, it modifies ECX . I don't know that this is the sqrtpd function (for example, I'm not seeing any SSE instructions here?) but whatever it is, it's being called from the IR I attached earlier, and is modifying ECX under some circumstances. On 19/07/2013 3:29 PM, Craig Topper wrote: > That should map directly to sqrtpd which can't modify ecx. > > > On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman > wrote: > > Sorry, that should have been llvm.x86.sse2.sqrt.pd > > > On 19/07/2013 3:25 PM, Craig Topper wrote: >> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >> prefixed with "llvm.x86". >> >> >> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman > > wrote: >> >> After stepping through the produced assembly, I believe I >> have a culprit. >> >> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the >> value of ECX - while the produced code is expecting it to >> still contain its previous value. >> >> Peter N >> >> >> On 19/07/2013 2:09 PM, Peter Newman wrote: >>> I've attached the module->dump() that our code is producing. >>> Unfortunately this is the smallest test case I have available. >>> >>> This is before any optimization passes are applied. There >>> are two separate modules in existence at the time, and there >>> are no guarantees about the order the surrounding code calls >>> those functions, so there may be some interaction between >>> them? There shouldn't be, they don't refer to any common >>> memory etc. There is no multi-threading occurring. >>> >>> The function in module-dump.ll (called crashfunc in this >>> file) is called with >>> - func_params 0x0018f3b0 double [3] >>> [0x0] -11.339976634695301 double >>> [0x1] -9.7504239056205506 double >>> [0x2] -5.2900856817382804 double >>> at the time of the exception. >>> >>> This is compiled on a "i686-pc-win32" triple. All of the >>> non-intrinsic functions referred to in these modules are the >>> standard equivalents from the MSVC library (e.g. @asin is >>> the standard C lib double asin( double ) ). >>> >>> Hopefully this is reproducible for you. >>> >>> -- >>> PeterN >>> >>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>> Are you able to send any IR for others to reproduce this issue? >>>> >>>> >>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>>> > wrote: >>>> >>>> Unfortunately, this doesn't appear to be the bug I'm >>>> hitting. I applied the fix to my source and it didn't >>>> make a difference. >>>> >>>> Also further testing found me getting the same behavior >>>> with other SIMD instructions. The common factor is in >>>> each case, ECX is set to 0x7fffffff, and it's an >>>> operation using xmm ptr ecx+offset . >>>> >>>> Additionally, turning the optimization level passed to >>>> createJIT down appears to avoid it, so I'm now leaning >>>> towards a bug in one of the optimization passes. >>>> >>>> I'm going to dig through the passes controlled by that >>>> parameter and see if I can narrow down which >>>> optimization is causing it. >>>> >>>> Peter N >>>> >>>> >>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>> >>>> As someone off list just told me, perhaps my new >>>> bug is the same issue: >>>> >>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>> >>>> Do you happen to be using FastISel? >>>> >>>> Solomon >>>> >>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>> > wrote: >>>> >>>> Hello all, >>>> >>>> I'm currently in the process of debugging a >>>> crash occurring in our program. In LLVM 3.2 and >>>> 3.3 it appears that JIT generated code is >>>> attempting to perform access unaligned memory >>>> with a SSE2 instruction. However this only >>>> happens under certain conditions that seem (but >>>> may not be) related to the stacks state on >>>> calling the function. >>>> >>>> Our program acts as a front-end, using the LLVM >>>> C++ API to generate a JIT generated function. >>>> This function is primarily mathematical, so we >>>> use the Vector types to take advantage of SIMD >>>> instructions (as well as a few SSE2 intrinsics). >>>> >>>> This worked in LLVM 2.8 but started failing in >>>> 3.2 and has continued to fail in 3.3. It fails >>>> with no optimizations applied to the LLVM >>>> Function/Module. It crashes with what is >>>> reported as a memory access error (accessing >>>> 0xffffffff), however it's suggested that this >>>> is how the SSE fault raising mechanism appears. >>>> >>>> The generated instruction varies, but it seems >>>> to often be similar to (I don't have it in >>>> front of me, sorry): >>>> movapd xmm0, xmm[ecx+0x???????] >>>> Where the xmm register changes, and the second >>>> parameter is a memory access. >>>> ECX is always set to 0x7ffffff - however I >>>> don't know if this is part of the SSE error >>>> reporting process or is part of the situation >>>> causing the error. >>>> >>>> I haven't worked out exactly what code path etc >>>> is causing this crash. I'm hoping that someone >>>> can tell me if there were any changed >>>> requirements for working with SIMD in LLVM 3.2 >>>> (or earlier, we haven't tried 3.0 or 3.1). I >>>> currently suspect the use of GlobalVariable (we >>>> first discovered the crash when using a feature >>>> that uses them), however I have attempted using >>>> setAlignment on the GlobalVariables without any >>>> change. >>>> >>>> -- >>>> Peter N >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu >>>> >>>> http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu >>>> http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Thu Jul 18 22:47:53 2013 From: craig.topper at gmail.com (Craig Topper) Date: Thu, 18 Jul 2013 22:47:53 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E8D26B.3000108@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> Message-ID: Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: > In the disassembly, I'm seeing three cases of > call 76719BA1 > > I am assuming this is the sqrt function as this is the only function > called in the LLVM IR. > > The code at 76719BA1 is: > > 76719BA1 push ebp > 76719BA2 mov ebp,esp > 76719BA4 sub esp,20h > 76719BA7 and esp,0FFFFFFF0h > 76719BAA fld st(0) > 76719BAC fst dword ptr [esp+18h] > 76719BB0 fistp qword ptr [esp+10h] > 76719BB4 fild qword ptr [esp+10h] > 76719BB8 mov edx,dword ptr [esp+18h] > 76719BBC mov eax,dword ptr [esp+10h] > 76719BC0 test eax,eax > 76719BC2 je 76719DCF > 76719BC8 fsubp st(1),st > 76719BCA test edx,edx > 76719BCC js 7671F9DB > 76719BD2 fstp dword ptr [esp] > 76719BD5 mov ecx,dword ptr [esp] > 76719BD8 add ecx,7FFFFFFFh > 76719BDE sbb eax,0 > 76719BE1 mov edx,dword ptr [esp+14h] > 76719BE5 sbb edx,0 > 76719BE8 leave > 76719BE9 ret > > > As you can see at 76719BD5, it modifies ECX . > > I don't know that this is the sqrtpd function (for example, I'm not seeing > any SSE instructions here?) but whatever it is, it's being called from the > IR I attached earlier, and is modifying ECX under some circumstances. > > > On 19/07/2013 3:29 PM, Craig Topper wrote: > > That should map directly to sqrtpd which can't modify ecx. > > > On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: > >> Sorry, that should have been llvm.x86.sse2.sqrt.pd >> >> >> On 19/07/2013 3:25 PM, Craig Topper wrote: >> >> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed >> with "llvm.x86". >> >> >> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >> >>> After stepping through the produced assembly, I believe I have a >>> culprit. >>> >>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX >>> - while the produced code is expecting it to still contain its previous >>> value. >>> >>> Peter N >>> >>> >>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>> >>> I've attached the module->dump() that our code is producing. >>> Unfortunately this is the smallest test case I have available. >>> >>> This is before any optimization passes are applied. There are two >>> separate modules in existence at the time, and there are no guarantees >>> about the order the surrounding code calls those functions, so there may be >>> some interaction between them? There shouldn't be, they don't refer to any >>> common memory etc. There is no multi-threading occurring. >>> >>> The function in module-dump.ll (called crashfunc in this file) is called >>> with >>> - func_params 0x0018f3b0 double [3] >>> [0x0] -11.339976634695301 double >>> [0x1] -9.7504239056205506 double >>> [0x2] -5.2900856817382804 double >>> at the time of the exception. >>> >>> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >>> functions referred to in these modules are the standard equivalents from >>> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >>> ). >>> >>> Hopefully this is reproducible for you. >>> >>> -- >>> PeterN >>> >>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>> >>> Are you able to send any IR for others to reproduce this issue? >>> >>> >>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>> >>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I applied >>>> the fix to my source and it didn't make a difference. >>>> >>>> Also further testing found me getting the same behavior with other SIMD >>>> instructions. The common factor is in each case, ECX is set to 0x7fffffff, >>>> and it's an operation using xmm ptr ecx+offset . >>>> >>>> Additionally, turning the optimization level passed to createJIT down >>>> appears to avoid it, so I'm now leaning towards a bug in one of the >>>> optimization passes. >>>> >>>> I'm going to dig through the passes controlled by that parameter and >>>> see if I can narrow down which optimization is causing it. >>>> >>>> Peter N >>>> >>>> >>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>> >>>>> As someone off list just told me, perhaps my new bug is the same issue: >>>>> >>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>> >>>>> Do you happen to be using FastISel? >>>>> >>>>> Solomon >>>>> >>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >>>>> >>>>> Hello all, >>>>>> >>>>>> I'm currently in the process of debugging a crash occurring in our >>>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>> However this only happens under certain conditions that seem (but may not >>>>>> be) related to the stacks state on calling the function. >>>>>> >>>>>> Our program acts as a front-end, using the LLVM C++ API to generate a >>>>>> JIT generated function. This function is primarily mathematical, so we use >>>>>> the Vector types to take advantage of SIMD instructions (as well as a few >>>>>> SSE2 intrinsics). >>>>>> >>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued >>>>>> to fail in 3.3. It fails with no optimizations applied to the LLVM >>>>>> Function/Module. It crashes with what is reported as a memory access error >>>>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>>>> fault raising mechanism appears. >>>>>> >>>>>> The generated instruction varies, but it seems to often be similar to >>>>>> (I don't have it in front of me, sorry): >>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>> Where the xmm register changes, and the second parameter is a memory >>>>>> access. >>>>>> ECX is always set to 0x7ffffff - however I don't know if this is part >>>>>> of the SSE error reporting process or is part of the situation causing the >>>>>> error. >>>>>> >>>>>> I haven't worked out exactly what code path etc is causing this >>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>> discovered the crash when using a feature that uses them), however I have >>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>> >>>>>> -- >>>>>> Peter N >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Thu Jul 18 22:53:44 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 15:53:44 +1000 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> Message-ID: <51E8D468.7030509@uformia.com> Is there something specifically required to enable SSE? If it's not detected as available (based from the target triple?) then I don't think we enable it specifically. Also it seems that it should handle converting to/from the vector types, although I can see it getting confused about needing to do that if it thinks SSE isn't available at all. On 19/07/2013 3:47 PM, Craig Topper wrote: > Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? > > > On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman > wrote: > > In the disassembly, I'm seeing three cases of > call 76719BA1 > > I am assuming this is the sqrt function as this is the only > function called in the LLVM IR. > > The code at 76719BA1 is: > > 76719BA1 push ebp > 76719BA2 mov ebp,esp > 76719BA4 sub esp,20h > 76719BA7 and esp,0FFFFFFF0h > 76719BAA fld st(0) > 76719BAC fst dword ptr [esp+18h] > 76719BB0 fistp qword ptr [esp+10h] > 76719BB4 fild qword ptr [esp+10h] > 76719BB8 mov edx,dword ptr [esp+18h] > 76719BBC mov eax,dword ptr [esp+10h] > 76719BC0 test eax,eax > 76719BC2 je 76719DCF > 76719BC8 fsubp st(1),st > 76719BCA test edx,edx > 76719BCC js 7671F9DB > 76719BD2 fstp dword ptr [esp] > 76719BD5 mov ecx,dword ptr [esp] > 76719BD8 add ecx,7FFFFFFFh > 76719BDE sbb eax,0 > 76719BE1 mov edx,dword ptr [esp+14h] > 76719BE5 sbb edx,0 > 76719BE8 leave > 76719BE9 ret > > > As you can see at 76719BD5, it modifies ECX . > > I don't know that this is the sqrtpd function (for example, I'm > not seeing any SSE instructions here?) but whatever it is, it's > being called from the IR I attached earlier, and is modifying ECX > under some circumstances. > > > On 19/07/2013 3:29 PM, Craig Topper wrote: >> That should map directly to sqrtpd which can't modify ecx. >> >> >> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman > > wrote: >> >> Sorry, that should have been llvm.x86.sse2.sqrt.pd >> >> >> On 19/07/2013 3:25 PM, Craig Topper wrote: >>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with >>> things prefixed with "llvm.x86". >>> >>> >>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman >>> > wrote: >>> >>> After stepping through the produced assembly, I believe >>> I have a culprit. >>> >>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying >>> the value of ECX - while the produced code is expecting >>> it to still contain its previous value. >>> >>> Peter N >>> >>> >>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>> I've attached the module->dump() that our code is >>>> producing. Unfortunately this is the smallest test case >>>> I have available. >>>> >>>> This is before any optimization passes are applied. >>>> There are two separate modules in existence at the >>>> time, and there are no guarantees about the order the >>>> surrounding code calls those functions, so there may be >>>> some interaction between them? There shouldn't be, they >>>> don't refer to any common memory etc. There is no >>>> multi-threading occurring. >>>> >>>> The function in module-dump.ll (called crashfunc in >>>> this file) is called with >>>> - func_params 0x0018f3b0 double [3] >>>> [0x0] -11.339976634695301 double >>>> [0x1] -9.7504239056205506 double >>>> [0x2] -5.2900856817382804 double >>>> at the time of the exception. >>>> >>>> This is compiled on a "i686-pc-win32" triple. All of >>>> the non-intrinsic functions referred to in these >>>> modules are the standard equivalents from the MSVC >>>> library (e.g. @asin is the standard C lib double >>>> asin( double ) ). >>>> >>>> Hopefully this is reproducible for you. >>>> >>>> -- >>>> PeterN >>>> >>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>> Are you able to send any IR for others to reproduce >>>>> this issue? >>>>> >>>>> >>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>>>> > wrote: >>>>> >>>>> Unfortunately, this doesn't appear to be the bug >>>>> I'm hitting. I applied the fix to my source and it >>>>> didn't make a difference. >>>>> >>>>> Also further testing found me getting the same >>>>> behavior with other SIMD instructions. The common >>>>> factor is in each case, ECX is set to 0x7fffffff, >>>>> and it's an operation using xmm ptr ecx+offset . >>>>> >>>>> Additionally, turning the optimization level >>>>> passed to createJIT down appears to avoid it, so >>>>> I'm now leaning towards a bug in one of the >>>>> optimization passes. >>>>> >>>>> I'm going to dig through the passes controlled by >>>>> that parameter and see if I can narrow down which >>>>> optimization is causing it. >>>>> >>>>> Peter N >>>>> >>>>> >>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>> >>>>> As someone off list just told me, perhaps my >>>>> new bug is the same issue: >>>>> >>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>> >>>>> Do you happen to be using FastISel? >>>>> >>>>> Solomon >>>>> >>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>> > >>>>> wrote: >>>>> >>>>> Hello all, >>>>> >>>>> I'm currently in the process of debugging >>>>> a crash occurring in our program. In LLVM >>>>> 3.2 and 3.3 it appears that JIT generated >>>>> code is attempting to perform access >>>>> unaligned memory with a SSE2 instruction. >>>>> However this only happens under certain >>>>> conditions that seem (but may not be) >>>>> related to the stacks state on calling the >>>>> function. >>>>> >>>>> Our program acts as a front-end, using the >>>>> LLVM C++ API to generate a JIT generated >>>>> function. This function is primarily >>>>> mathematical, so we use the Vector types >>>>> to take advantage of SIMD instructions (as >>>>> well as a few SSE2 intrinsics). >>>>> >>>>> This worked in LLVM 2.8 but started >>>>> failing in 3.2 and has continued to fail >>>>> in 3.3. It fails with no optimizations >>>>> applied to the LLVM Function/Module. It >>>>> crashes with what is reported as a memory >>>>> access error (accessing 0xffffffff), >>>>> however it's suggested that this is how >>>>> the SSE fault raising mechanism appears. >>>>> >>>>> The generated instruction varies, but it >>>>> seems to often be similar to (I don't have >>>>> it in front of me, sorry): >>>>> movapd xmm0, xmm[ecx+0x???????] >>>>> Where the xmm register changes, and the >>>>> second parameter is a memory access. >>>>> ECX is always set to 0x7ffffff - however I >>>>> don't know if this is part of the SSE >>>>> error reporting process or is part of the >>>>> situation causing the error. >>>>> >>>>> I haven't worked out exactly what code >>>>> path etc is causing this crash. I'm hoping >>>>> that someone can tell me if there were any >>>>> changed requirements for working with SIMD >>>>> in LLVM 3.2 (or earlier, we haven't tried >>>>> 3.0 or 3.1). I currently suspect the use >>>>> of GlobalVariable (we first discovered the >>>>> crash when using a feature that uses >>>>> them), however I have attempted using >>>>> setAlignment on the GlobalVariables >>>>> without any change. >>>>> >>>>> -- >>>>> Peter N >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu >>>>> >>>>> http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu >>>>> http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Thu Jul 18 23:00:41 2013 From: craig.topper at gmail.com (Craig Topper) Date: Thu, 18 Jul 2013 23:00:41 -0700 Subject: [LLVMdev] SIMD instructions and memory alignment on X86 In-Reply-To: <51E8D468.7030509@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> Message-ID: Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think we > enable it specifically. > > Also it seems that it should handle converting to/from the vector types, > although I can see it getting confused about needing to do that if it > thinks SSE isn't available at all. > > > On 19/07/2013 3:47 PM, Craig Topper wrote: > > Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? > > > On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: > >> In the disassembly, I'm seeing three cases of >> call 76719BA1 >> >> I am assuming this is the sqrt function as this is the only function >> called in the LLVM IR. >> >> The code at 76719BA1 is: >> >> 76719BA1 push ebp >> 76719BA2 mov ebp,esp >> 76719BA4 sub esp,20h >> 76719BA7 and esp,0FFFFFFF0h >> 76719BAA fld st(0) >> 76719BAC fst dword ptr [esp+18h] >> 76719BB0 fistp qword ptr [esp+10h] >> 76719BB4 fild qword ptr [esp+10h] >> 76719BB8 mov edx,dword ptr [esp+18h] >> 76719BBC mov eax,dword ptr [esp+10h] >> 76719BC0 test eax,eax >> 76719BC2 je 76719DCF >> 76719BC8 fsubp st(1),st >> 76719BCA test edx,edx >> 76719BCC js 7671F9DB >> 76719BD2 fstp dword ptr [esp] >> 76719BD5 mov ecx,dword ptr [esp] >> 76719BD8 add ecx,7FFFFFFFh >> 76719BDE sbb eax,0 >> 76719BE1 mov edx,dword ptr [esp+14h] >> 76719BE5 sbb edx,0 >> 76719BE8 leave >> 76719BE9 ret >> >> >> As you can see at 76719BD5, it modifies ECX . >> >> I don't know that this is the sqrtpd function (for example, I'm not >> seeing any SSE instructions here?) but whatever it is, it's being called >> from the IR I attached earlier, and is modifying ECX under some >> circumstances. >> >> >> On 19/07/2013 3:29 PM, Craig Topper wrote: >> >> That should map directly to sqrtpd which can't modify ecx. >> >> >> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >> >>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>> >>> >>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>> >>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed >>> with "llvm.x86". >>> >>> >>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>> >>>> After stepping through the produced assembly, I believe I have a >>>> culprit. >>>> >>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>> ECX - while the produced code is expecting it to still contain its previous >>>> value. >>>> >>>> Peter N >>>> >>>> >>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>> >>>> I've attached the module->dump() that our code is producing. >>>> Unfortunately this is the smallest test case I have available. >>>> >>>> This is before any optimization passes are applied. There are two >>>> separate modules in existence at the time, and there are no guarantees >>>> about the order the surrounding code calls those functions, so there may be >>>> some interaction between them? There shouldn't be, they don't refer to any >>>> common memory etc. There is no multi-threading occurring. >>>> >>>> The function in module-dump.ll (called crashfunc in this file) is >>>> called with >>>> - func_params 0x0018f3b0 double [3] >>>> [0x0] -11.339976634695301 double >>>> [0x1] -9.7504239056205506 double >>>> [0x2] -5.2900856817382804 double >>>> at the time of the exception. >>>> >>>> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >>>> functions referred to in these modules are the standard equivalents from >>>> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >>>> ). >>>> >>>> Hopefully this is reproducible for you. >>>> >>>> -- >>>> PeterN >>>> >>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>> >>>> Are you able to send any IR for others to reproduce this issue? >>>> >>>> >>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>>> >>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>> applied the fix to my source and it didn't make a difference. >>>>> >>>>> Also further testing found me getting the same behavior with other >>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>> >>>>> Additionally, turning the optimization level passed to createJIT down >>>>> appears to avoid it, so I'm now leaning towards a bug in one of the >>>>> optimization passes. >>>>> >>>>> I'm going to dig through the passes controlled by that parameter and >>>>> see if I can narrow down which optimization is causing it. >>>>> >>>>> Peter N >>>>> >>>>> >>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>> >>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>> issue: >>>>>> >>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>> >>>>>> Do you happen to be using FastISel? >>>>>> >>>>>> Solomon >>>>>> >>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >>>>>> >>>>>> Hello all, >>>>>>> >>>>>>> I'm currently in the process of debugging a crash occurring in our >>>>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>> be) related to the stacks state on calling the function. >>>>>>> >>>>>>> Our program acts as a front-end, using the LLVM C++ API to generate >>>>>>> a JIT generated function. This function is primarily mathematical, so we >>>>>>> use the Vector types to take advantage of SIMD instructions (as well as a >>>>>>> few SSE2 intrinsics). >>>>>>> >>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued >>>>>>> to fail in 3.3. It fails with no optimizations applied to the LLVM >>>>>>> Function/Module. It crashes with what is reported as a memory access error >>>>>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>>>>> fault raising mechanism appears. >>>>>>> >>>>>>> The generated instruction varies, but it seems to often be similar >>>>>>> to (I don't have it in front of me, sorry): >>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>> Where the xmm register changes, and the second parameter is a memory >>>>>>> access. >>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>> the error. >>>>>>> >>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>> >>>>>>> -- >>>>>>> Peter N >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Thu Jul 18 23:59:13 2013 From: craig.topper at gmail.com (Craig Topper) Date: Thu, 18 Jul 2013 23:59:13 -0700 Subject: [LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX In-Reply-To: <51E8DE10.9090900@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> Message-ID: The calls represent the MSVC _ftol2 function I think. On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman wrote: > (Changing subject line as diagnosis has changed) > > I'm attaching the compiled code that I've been getting, both with > CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with > CodeGenOpt::None, but that seems to be because ECX isn't being used - it > still gets set to 0x7fffffff by one of the calls to 76719BA1 > > I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. I > was thinking an optimization might be removing it, but I don't get the > sqrtpd instruction even if the createJIT optimization level turned off. > > I am trying this with the Release 3.3 code - I'll try it with trunk and > see if I get a different result there. Maybe there was a recent commit for > this. > > -- > Peter N > > On 19/07/2013 4:00 PM, Craig Topper wrote: > > Hmm, I'm not able to get those .ll files to compile if I disable SSE and I > end up with SSE instructions(including sqrtpd) if I don't disable it. > > > On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: > >> Is there something specifically required to enable SSE? If it's not >> detected as available (based from the target triple?) then I don't think we >> enable it specifically. >> >> Also it seems that it should handle converting to/from the vector types, >> although I can see it getting confused about needing to do that if it >> thinks SSE isn't available at all. >> >> >> On 19/07/2013 3:47 PM, Craig Topper wrote: >> >> Hmm, maybe sse isn't being enabled so its falling back to emulating sqrt? >> >> >> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: >> >>> In the disassembly, I'm seeing three cases of >>> call 76719BA1 >>> >>> I am assuming this is the sqrt function as this is the only function >>> called in the LLVM IR. >>> >>> The code at 76719BA1 is: >>> >>> 76719BA1 push ebp >>> 76719BA2 mov ebp,esp >>> 76719BA4 sub esp,20h >>> 76719BA7 and esp,0FFFFFFF0h >>> 76719BAA fld st(0) >>> 76719BAC fst dword ptr [esp+18h] >>> 76719BB0 fistp qword ptr [esp+10h] >>> 76719BB4 fild qword ptr [esp+10h] >>> 76719BB8 mov edx,dword ptr [esp+18h] >>> 76719BBC mov eax,dword ptr [esp+10h] >>> 76719BC0 test eax,eax >>> 76719BC2 je 76719DCF >>> 76719BC8 fsubp st(1),st >>> 76719BCA test edx,edx >>> 76719BCC js 7671F9DB >>> 76719BD2 fstp dword ptr [esp] >>> 76719BD5 mov ecx,dword ptr [esp] >>> 76719BD8 add ecx,7FFFFFFFh >>> 76719BDE sbb eax,0 >>> 76719BE1 mov edx,dword ptr [esp+14h] >>> 76719BE5 sbb edx,0 >>> 76719BE8 leave >>> 76719BE9 ret >>> >>> >>> As you can see at 76719BD5, it modifies ECX . >>> >>> I don't know that this is the sqrtpd function (for example, I'm not >>> seeing any SSE instructions here?) but whatever it is, it's being called >>> from the IR I attached earlier, and is modifying ECX under some >>> circumstances. >>> >>> >>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>> >>> That should map directly to sqrtpd which can't modify ecx. >>> >>> >>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >>> >>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>> >>>> >>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>> >>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed >>>> with "llvm.x86". >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>>> >>>>> After stepping through the produced assembly, I believe I have a >>>>> culprit. >>>>> >>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>>> ECX - while the produced code is expecting it to still contain its previous >>>>> value. >>>>> >>>>> Peter N >>>>> >>>>> >>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>> >>>>> I've attached the module->dump() that our code is producing. >>>>> Unfortunately this is the smallest test case I have available. >>>>> >>>>> This is before any optimization passes are applied. There are two >>>>> separate modules in existence at the time, and there are no guarantees >>>>> about the order the surrounding code calls those functions, so there may be >>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>> common memory etc. There is no multi-threading occurring. >>>>> >>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>> called with >>>>> - func_params 0x0018f3b0 double [3] >>>>> [0x0] -11.339976634695301 double >>>>> [0x1] -9.7504239056205506 double >>>>> [0x2] -5.2900856817382804 double >>>>> at the time of the exception. >>>>> >>>>> This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic >>>>> functions referred to in these modules are the standard equivalents from >>>>> the MSVC library (e.g. @asin is the standard C lib double asin( double ) >>>>> ). >>>>> >>>>> Hopefully this is reproducible for you. >>>>> >>>>> -- >>>>> PeterN >>>>> >>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>> >>>>> Are you able to send any IR for others to reproduce this issue? >>>>> >>>>> >>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>>>> >>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>> applied the fix to my source and it didn't make a difference. >>>>>> >>>>>> Also further testing found me getting the same behavior with other >>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>> >>>>>> Additionally, turning the optimization level passed to createJIT down >>>>>> appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>> optimization passes. >>>>>> >>>>>> I'm going to dig through the passes controlled by that parameter and >>>>>> see if I can narrow down which optimization is causing it. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>> >>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>> issue: >>>>>>> >>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>> >>>>>>> Do you happen to be using FastISel? >>>>>>> >>>>>>> Solomon >>>>>>> >>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman wrote: >>>>>>> >>>>>>> Hello all, >>>>>>>> >>>>>>>> I'm currently in the process of debugging a crash occurring in our >>>>>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>> be) related to the stacks state on calling the function. >>>>>>>> >>>>>>>> Our program acts as a front-end, using the LLVM C++ API to generate >>>>>>>> a JIT generated function. This function is primarily mathematical, so we >>>>>>>> use the Vector types to take advantage of SIMD instructions (as well as a >>>>>>>> few SSE2 intrinsics). >>>>>>>> >>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>> SSE fault raising mechanism appears. >>>>>>>> >>>>>>>> The generated instruction varies, but it seems to often be similar >>>>>>>> to (I don't have it in front of me, sorry): >>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>> memory access. >>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>> the error. >>>>>>>> >>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>> >>>>>>>> -- >>>>>>>> Peter N >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Thu Jul 18 23:59:28 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 16:59:28 +1000 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> Message-ID: <51E8E3D0.5040807@uformia.com> Oh, excellent point, I agree. My bad. Now that I'm not assuming those are the sqrt, I see the sqrtpd's in the output. Also there are three fptoui's and there are 3 call instances. (Changing subject line again.) Now it looks like it's bug #13862 On 19/07/2013 4:51 PM, Craig Topper wrote: > I think those calls correspond to this > > %110 = fptoui double %109 to i32 > > The calls are followed by an imul with 12 which matches up with what > occurs right after the fptoui in the IR. > > > On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman > wrote: > > Yes, that is the result of module-dump.ll > > > On 19/07/2013 4:46 PM, Craig Topper wrote: >> Does this correspond to one of the .ll files you sent earlier? >> >> >> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman > > wrote: >> >> (Changing subject line as diagnosis has changed) >> >> I'm attaching the compiled code that I've been getting, both >> with CodeGenOpt::Default and CodeGenOpt::None . The crash >> isn't occurring with CodeGenOpt::None, but that seems to be >> because ECX isn't being used - it still gets set to >> 0x7fffffff by one of the calls to 76719BA1 >> >> I notice that X86::SQRTPD[m|r] appear in >> X86InstrInfo::isHighLatencyDef. I was thinking an >> optimization might be removing it, but I don't get the sqrtpd >> instruction even if the createJIT optimization level turned off. >> >> I am trying this with the Release 3.3 code - I'll try it with >> trunk and see if I get a different result there. Maybe there >> was a recent commit for this. >> >> -- >> Peter N >> >> On 19/07/2013 4:00 PM, Craig Topper wrote: >>> Hmm, I'm not able to get those .ll files to compile if I >>> disable SSE and I end up with SSE instructions(including >>> sqrtpd) if I don't disable it. >>> >>> >>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>> > wrote: >>> >>> Is there something specifically required to enable SSE? >>> If it's not detected as available (based from the target >>> triple?) then I don't think we enable it specifically. >>> >>> Also it seems that it should handle converting to/from >>> the vector types, although I can see it getting confused >>> about needing to do that if it thinks SSE isn't >>> available at all. >>> >>> >>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>> Hmm, maybe sse isn't being enabled so its falling back >>>> to emulating sqrt? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman >>>> > wrote: >>>> >>>> In the disassembly, I'm seeing three cases of >>>> call 76719BA1 >>>> >>>> I am assuming this is the sqrt function as this is >>>> the only function called in the LLVM IR. >>>> >>>> The code at 76719BA1 is: >>>> >>>> 76719BA1 push ebp >>>> 76719BA2 mov ebp,esp >>>> 76719BA4 sub esp,20h >>>> 76719BA7 and esp,0FFFFFFF0h >>>> 76719BAA fld st(0) >>>> 76719BAC fst dword ptr [esp+18h] >>>> 76719BB0 fistp qword ptr [esp+10h] >>>> 76719BB4 fild qword ptr [esp+10h] >>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>> 76719BBC mov eax,dword ptr [esp+10h] >>>> 76719BC0 test eax,eax >>>> 76719BC2 je 76719DCF >>>> 76719BC8 fsubp st(1),st >>>> 76719BCA test edx,edx >>>> 76719BCC js 7671F9DB >>>> 76719BD2 fstp dword ptr [esp] >>>> 76719BD5 mov ecx,dword ptr [esp] >>>> 76719BD8 add ecx,7FFFFFFFh >>>> 76719BDE sbb eax,0 >>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>> 76719BE5 sbb edx,0 >>>> 76719BE8 leave >>>> 76719BE9 ret >>>> >>>> >>>> As you can see at 76719BD5, it modifies ECX . >>>> >>>> I don't know that this is the sqrtpd function (for >>>> example, I'm not seeing any SSE instructions here?) >>>> but whatever it is, it's being called from the IR I >>>> attached earlier, and is modifying ECX under some >>>> circumstances. >>>> >>>> >>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>> That should map directly to sqrtpd which can't >>>>> modify ecx. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman >>>>> > wrote: >>>>> >>>>> Sorry, that should have been >>>>> llvm.x86.sse2.sqrt.pd >>>>> >>>>> >>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only >>>>>> familiar with things prefixed with "llvm.x86". >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter >>>>>> Newman >>>>> > wrote: >>>>>> >>>>>> After stepping through the produced >>>>>> assembly, I believe I have a culprit. >>>>>> >>>>>> One of the calls to >>>>>> @frep.x86.sse2.sqrt.pd is modifying the >>>>>> value of ECX - while the produced code is >>>>>> expecting it to still contain its >>>>>> previous value. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>> I've attached the module->dump() that >>>>>>> our code is producing. Unfortunately >>>>>>> this is the smallest test case I have >>>>>>> available. >>>>>>> >>>>>>> This is before any optimization passes >>>>>>> are applied. There are two separate >>>>>>> modules in existence at the time, and >>>>>>> there are no guarantees about the order >>>>>>> the surrounding code calls those >>>>>>> functions, so there may be some >>>>>>> interaction between them? There >>>>>>> shouldn't be, they don't refer to any >>>>>>> common memory etc. There is no >>>>>>> multi-threading occurring. >>>>>>> >>>>>>> The function in module-dump.ll (called >>>>>>> crashfunc in this file) is called with >>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>> [0x0] -11.339976634695301 double >>>>>>> [0x1] -9.7504239056205506 double >>>>>>> [0x2] -5.2900856817382804 double >>>>>>> at the time of the exception. >>>>>>> >>>>>>> This is compiled on a "i686-pc-win32" >>>>>>> triple. All of the non-intrinsic >>>>>>> functions referred to in these modules >>>>>>> are the standard equivalents from the >>>>>>> MSVC library (e.g. @asin is the standard >>>>>>> C lib double asin( double ) ). >>>>>>> >>>>>>> Hopefully this is reproducible for you. >>>>>>> >>>>>>> -- >>>>>>> PeterN >>>>>>> >>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> Are you able to send any IR for others >>>>>>>> to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter >>>>>>>> Newman >>>>>>> > wrote: >>>>>>>> >>>>>>>> Unfortunately, this doesn't appear >>>>>>>> to be the bug I'm hitting. I >>>>>>>> applied the fix to my source and it >>>>>>>> didn't make a difference. >>>>>>>> >>>>>>>> Also further testing found me >>>>>>>> getting the same behavior with >>>>>>>> other SIMD instructions. The common >>>>>>>> factor is in each case, ECX is set >>>>>>>> to 0x7fffffff, and it's an >>>>>>>> operation using xmm ptr ecx+offset . >>>>>>>> >>>>>>>> Additionally, turning the >>>>>>>> optimization level passed to >>>>>>>> createJIT down appears to avoid it, >>>>>>>> so I'm now leaning towards a bug in >>>>>>>> one of the optimization passes. >>>>>>>> >>>>>>>> I'm going to dig through the passes >>>>>>>> controlled by that parameter and >>>>>>>> see if I can narrow down which >>>>>>>> optimization is causing it. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 17/07/2013 1:58 PM, Solomon >>>>>>>> Boulos wrote: >>>>>>>> >>>>>>>> As someone off list just told >>>>>>>> me, perhaps my new bug is the >>>>>>>> same issue: >>>>>>>> >>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>> >>>>>>>> Do you happen to be using FastISel? >>>>>>>> >>>>>>>> Solomon >>>>>>>> >>>>>>>> On Jul 16, 2013, at 6:39 PM, >>>>>>>> Peter Newman >>>>>>> > wrote: >>>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I'm currently in the >>>>>>>> process of debugging a >>>>>>>> crash occurring in our >>>>>>>> program. In LLVM 3.2 and >>>>>>>> 3.3 it appears that JIT >>>>>>>> generated code is >>>>>>>> attempting to perform >>>>>>>> access unaligned memory >>>>>>>> with a SSE2 instruction. >>>>>>>> However this only happens >>>>>>>> under certain conditions >>>>>>>> that seem (but may not be) >>>>>>>> related to the stacks state >>>>>>>> on calling the function. >>>>>>>> >>>>>>>> Our program acts as a >>>>>>>> front-end, using the LLVM >>>>>>>> C++ API to generate a JIT >>>>>>>> generated function. This >>>>>>>> function is primarily >>>>>>>> mathematical, so we use the >>>>>>>> Vector types to take >>>>>>>> advantage of SIMD >>>>>>>> instructions (as well as a >>>>>>>> few SSE2 intrinsics). >>>>>>>> >>>>>>>> This worked in LLVM 2.8 but >>>>>>>> started failing in 3.2 and >>>>>>>> has continued to fail in >>>>>>>> 3.3. It fails with no >>>>>>>> optimizations applied to >>>>>>>> the LLVM Function/Module. >>>>>>>> It crashes with what is >>>>>>>> reported as a memory access >>>>>>>> error (accessing >>>>>>>> 0xffffffff), however it's >>>>>>>> suggested that this is how >>>>>>>> the SSE fault raising >>>>>>>> mechanism appears. >>>>>>>> >>>>>>>> The generated instruction >>>>>>>> varies, but it seems to >>>>>>>> often be similar to (I >>>>>>>> don't have it in front of >>>>>>>> me, sorry): >>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>> Where the xmm register >>>>>>>> changes, and the second >>>>>>>> parameter is a memory access. >>>>>>>> ECX is always set to >>>>>>>> 0x7ffffff - however I don't >>>>>>>> know if this is part of the >>>>>>>> SSE error reporting process >>>>>>>> or is part of the situation >>>>>>>> causing the error. >>>>>>>> >>>>>>>> I haven't worked out >>>>>>>> exactly what code path etc >>>>>>>> is causing this crash. I'm >>>>>>>> hoping that someone can >>>>>>>> tell me if there were any >>>>>>>> changed requirements for >>>>>>>> working with SIMD in LLVM >>>>>>>> 3.2 (or earlier, we haven't >>>>>>>> tried 3.0 or 3.1). I >>>>>>>> currently suspect the use >>>>>>>> of GlobalVariable (we first >>>>>>>> discovered the crash when >>>>>>>> using a feature that uses >>>>>>>> them), however I have >>>>>>>> attempted using >>>>>>>> setAlignment on the >>>>>>>> GlobalVariables without any >>>>>>>> change. >>>>>>>> >>>>>>>> -- >>>>>>>> Peter N >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>> >>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Fri Jul 19 00:23:02 2013 From: craig.topper at gmail.com (Craig Topper) Date: Fri, 19 Jul 2013 00:23:02 -0700 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: <51E8E3D0.5040807@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> Message-ID: Try adding ECX to the Defs of this part of lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a Windows machine to test myself. let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), "# win32 fptoui", [(X86WinFTOL RFP32:$src)]>, Requires<[In32BitMode]>; def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), "# win32 fptoui", [(X86WinFTOL RFP64:$src)]>, Requires<[In32BitMode]>; } On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman wrote: > Oh, excellent point, I agree. My bad. Now that I'm not assuming those > are the sqrt, I see the sqrtpd's in the output. Also there are three > fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: > > I think those calls correspond to this > > %110 = fptoui double %109 to i32 > > The calls are followed by an imul with 12 which matches up with what > occurs right after the fptoui in the IR. > > > On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman wrote: > >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >> >> Does this correspond to one of the .ll files you sent earlier? >> >> >> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman wrote: >> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, both with >>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. >>> I was thinking an optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>> see if I get a different result there. Maybe there was a recent commit for >>> this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>> >>> Hmm, I'm not able to get those .ll files to compile if I disable SSE and >>> I end up with SSE instructions(including sqrtpd) if I don't disable it. >>> >>> >>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: >>> >>>> Is there something specifically required to enable SSE? If it's not >>>> detected as available (based from the target triple?) then I don't think we >>>> enable it specifically. >>>> >>>> Also it seems that it should handle converting to/from the vector >>>> types, although I can see it getting confused about needing to do that if >>>> it thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>> >>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>> sqrt? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: >>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as this is the only function >>>>> called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>> from the IR I attached earlier, and is modifying ECX under some >>>>> circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>> >>>>> That should map directly to sqrtpd which can't modify ecx. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >>>>> >>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>> >>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>> prefixed with "llvm.x86". >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>>>>> >>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>> culprit. >>>>>>> >>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of >>>>>>> ECX - while the produced code is expecting it to still contain its previous >>>>>>> value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>> >>>>>>> I've attached the module->dump() that our code is producing. >>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>> >>>>>>> This is before any optimization passes are applied. There are two >>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>> >>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>> called with >>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>> [0x0] -11.339976634695301 double >>>>>>> [0x1] -9.7504239056205506 double >>>>>>> [0x2] -5.2900856817382804 double >>>>>>> at the time of the exception. >>>>>>> >>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>> double asin( double ) ). >>>>>>> >>>>>>> Hopefully this is reproducible for you. >>>>>>> >>>>>>> -- >>>>>>> PeterN >>>>>>> >>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>> >>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>>>>>> >>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>> >>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>> >>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>> optimization passes. >>>>>>>> >>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>> >>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>> issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>> >>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>> >>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>> >>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>> memory access. >>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>> the error. >>>>>>>>>> >>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Peter N >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Fri Jul 19 00:24:04 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 17:24:04 +1000 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> Message-ID: <51E8E994.3030400@uformia.com> Thank you, I'm trying this now. On 19/07/2013 5:23 PM, Craig Topper wrote: > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have > a Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman > wrote: > > Oh, excellent point, I agree. My bad. Now that I'm not assuming > those are the sqrt, I see the sqrtpd's in the output. Also there > are three fptoui's and there are 3 call instances. > > (Changing subject line again.) > > Now it looks like it's bug #13862 > > On 19/07/2013 4:51 PM, Craig Topper wrote: >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with >> what occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman > > wrote: >> >> Yes, that is the result of module-dump.ll >> >> >> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>> > wrote: >>> >>> (Changing subject line as diagnosis has changed) >>> >>> I'm attaching the compiled code that I've been getting, >>> both with CodeGenOpt::Default and CodeGenOpt::None . The >>> crash isn't occurring with CodeGenOpt::None, but that >>> seems to be because ECX isn't being used - it still gets >>> set to 0x7fffffff by one of the calls to 76719BA1 >>> >>> I notice that X86::SQRTPD[m|r] appear in >>> X86InstrInfo::isHighLatencyDef. I was thinking an >>> optimization might be removing it, but I don't get the >>> sqrtpd instruction even if the createJIT optimization >>> level turned off. >>> >>> I am trying this with the Release 3.3 code - I'll try it >>> with trunk and see if I get a different result there. >>> Maybe there was a recent commit for this. >>> >>> -- >>> Peter N >>> >>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> Hmm, I'm not able to get those .ll files to compile if >>>> I disable SSE and I end up with SSE >>>> instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>>> > wrote: >>>> >>>> Is there something specifically required to enable >>>> SSE? If it's not detected as available (based from >>>> the target triple?) then I don't think we enable it >>>> specifically. >>>> >>>> Also it seems that it should handle converting >>>> to/from the vector types, although I can see it >>>> getting confused about needing to do that if it >>>> thinks SSE isn't available at all. >>>> >>>> >>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> Hmm, maybe sse isn't being enabled so its falling >>>>> back to emulating sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman >>>>> > wrote: >>>>> >>>>> In the disassembly, I'm seeing three cases of >>>>> call 76719BA1 >>>>> >>>>> I am assuming this is the sqrt function as >>>>> this is the only function called in the LLVM IR. >>>>> >>>>> The code at 76719BA1 is: >>>>> >>>>> 76719BA1 push ebp >>>>> 76719BA2 mov ebp,esp >>>>> 76719BA4 sub esp,20h >>>>> 76719BA7 and esp,0FFFFFFF0h >>>>> 76719BAA fld st(0) >>>>> 76719BAC fst dword ptr [esp+18h] >>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>> 76719BB4 fild qword ptr [esp+10h] >>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>> 76719BC0 test eax,eax >>>>> 76719BC2 je 76719DCF >>>>> 76719BC8 fsubp st(1),st >>>>> 76719BCA test edx,edx >>>>> 76719BCC js 7671F9DB >>>>> 76719BD2 fstp dword ptr [esp] >>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>> 76719BD8 add ecx,7FFFFFFFh >>>>> 76719BDE sbb eax,0 >>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>> 76719BE5 sbb edx,0 >>>>> 76719BE8 leave >>>>> 76719BE9 ret >>>>> >>>>> >>>>> As you can see at 76719BD5, it modifies ECX . >>>>> >>>>> I don't know that this is the sqrtpd function >>>>> (for example, I'm not seeing any SSE >>>>> instructions here?) but whatever it is, it's >>>>> being called from the IR I attached earlier, >>>>> and is modifying ECX under some circumstances. >>>>> >>>>> >>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> That should map directly to sqrtpd which >>>>>> can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter >>>>>> Newman >>>>> > wrote: >>>>>> >>>>>> Sorry, that should have been >>>>>> llvm.x86.sse2.sqrt.pd >>>>>> >>>>>> >>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm >>>>>>> only familiar with things prefixed with >>>>>>> "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter >>>>>>> Newman >>>>>> > wrote: >>>>>>> >>>>>>> After stepping through the produced >>>>>>> assembly, I believe I have a culprit. >>>>>>> >>>>>>> One of the calls to >>>>>>> @frep.x86.sse2.sqrt.pd is modifying >>>>>>> the value of ECX - while the >>>>>>> produced code is expecting it to >>>>>>> still contain its previous value. >>>>>>> >>>>>>> Peter N >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 2:09 PM, Peter Newman >>>>>>> wrote: >>>>>>>> I've attached the module->dump() >>>>>>>> that our code is producing. >>>>>>>> Unfortunately this is the smallest >>>>>>>> test case I have available. >>>>>>>> >>>>>>>> This is before any optimization >>>>>>>> passes are applied. There are two >>>>>>>> separate modules in existence at >>>>>>>> the time, and there are no >>>>>>>> guarantees about the order the >>>>>>>> surrounding code calls those >>>>>>>> functions, so there may be some >>>>>>>> interaction between them? There >>>>>>>> shouldn't be, they don't refer to >>>>>>>> any common memory etc. There is no >>>>>>>> multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll >>>>>>>> (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a >>>>>>>> "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to >>>>>>>> in these modules are the standard >>>>>>>> equivalents from the MSVC library >>>>>>>> (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper >>>>>>>> wrote: >>>>>>>>> Are you able to send any IR for >>>>>>>>> others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, >>>>>>>>> Peter Newman >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> Unfortunately, this doesn't >>>>>>>>> appear to be the bug I'm >>>>>>>>> hitting. I applied the fix to >>>>>>>>> my source and it didn't make a >>>>>>>>> difference. >>>>>>>>> >>>>>>>>> Also further testing found me >>>>>>>>> getting the same behavior with >>>>>>>>> other SIMD instructions. The >>>>>>>>> common factor is in each case, >>>>>>>>> ECX is set to 0x7fffffff, and >>>>>>>>> it's an operation using xmm >>>>>>>>> ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the >>>>>>>>> optimization level passed to >>>>>>>>> createJIT down appears to >>>>>>>>> avoid it, so I'm now leaning >>>>>>>>> towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the >>>>>>>>> passes controlled by that >>>>>>>>> parameter and see if I can >>>>>>>>> narrow down which optimization >>>>>>>>> is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon >>>>>>>>> Boulos wrote: >>>>>>>>> >>>>>>>>> As someone off list just >>>>>>>>> told me, perhaps my new >>>>>>>>> bug is the same issue: >>>>>>>>> >>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>> >>>>>>>>> Do you happen to be using >>>>>>>>> FastISel? >>>>>>>>> >>>>>>>>> Solomon >>>>>>>>> >>>>>>>>> On Jul 16, 2013, at 6:39 >>>>>>>>> PM, Peter Newman >>>>>>>>> >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I'm currently in the >>>>>>>>> process of debugging a >>>>>>>>> crash occurring in our >>>>>>>>> program. In LLVM 3.2 >>>>>>>>> and 3.3 it appears >>>>>>>>> that JIT generated >>>>>>>>> code is attempting to >>>>>>>>> perform access >>>>>>>>> unaligned memory with >>>>>>>>> a SSE2 instruction. >>>>>>>>> However this only >>>>>>>>> happens under certain >>>>>>>>> conditions that seem >>>>>>>>> (but may not be) >>>>>>>>> related to the stacks >>>>>>>>> state on calling the >>>>>>>>> function. >>>>>>>>> >>>>>>>>> Our program acts as a >>>>>>>>> front-end, using the >>>>>>>>> LLVM C++ API to >>>>>>>>> generate a JIT >>>>>>>>> generated function. >>>>>>>>> This function is >>>>>>>>> primarily >>>>>>>>> mathematical, so we >>>>>>>>> use the Vector types >>>>>>>>> to take advantage of >>>>>>>>> SIMD instructions (as >>>>>>>>> well as a few SSE2 >>>>>>>>> intrinsics). >>>>>>>>> >>>>>>>>> This worked in LLVM >>>>>>>>> 2.8 but started >>>>>>>>> failing in 3.2 and has >>>>>>>>> continued to fail in >>>>>>>>> 3.3. It fails with no >>>>>>>>> optimizations applied >>>>>>>>> to the LLVM >>>>>>>>> Function/Module. It >>>>>>>>> crashes with what is >>>>>>>>> reported as a memory >>>>>>>>> access error >>>>>>>>> (accessing >>>>>>>>> 0xffffffff), however >>>>>>>>> it's suggested that >>>>>>>>> this is how the SSE >>>>>>>>> fault raising >>>>>>>>> mechanism appears. >>>>>>>>> >>>>>>>>> The generated >>>>>>>>> instruction varies, >>>>>>>>> but it seems to often >>>>>>>>> be similar to (I don't >>>>>>>>> have it in front of >>>>>>>>> me, sorry): >>>>>>>>> movapd xmm0, >>>>>>>>> xmm[ecx+0x???????] >>>>>>>>> Where the xmm register >>>>>>>>> changes, and the >>>>>>>>> second parameter is a >>>>>>>>> memory access. >>>>>>>>> ECX is always set to >>>>>>>>> 0x7ffffff - however I >>>>>>>>> don't know if this is >>>>>>>>> part of the SSE error >>>>>>>>> reporting process or >>>>>>>>> is part of the >>>>>>>>> situation causing the >>>>>>>>> error. >>>>>>>>> >>>>>>>>> I haven't worked out >>>>>>>>> exactly what code path >>>>>>>>> etc is causing this >>>>>>>>> crash. I'm hoping that >>>>>>>>> someone can tell me if >>>>>>>>> there were any changed >>>>>>>>> requirements for >>>>>>>>> working with SIMD in >>>>>>>>> LLVM 3.2 (or earlier, >>>>>>>>> we haven't tried 3.0 >>>>>>>>> or 3.1). I currently >>>>>>>>> suspect the use of >>>>>>>>> GlobalVariable (we >>>>>>>>> first discovered the >>>>>>>>> crash when using a >>>>>>>>> feature that uses >>>>>>>>> them), however I have >>>>>>>>> attempted using >>>>>>>>> setAlignment on the >>>>>>>>> GlobalVariables >>>>>>>>> without any change. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Peter N >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers >>>>>>>>> mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>> >>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Fri Jul 19 00:45:09 2013 From: craig.topper at gmail.com (Craig Topper) Date: Fri, 19 Jul 2013 00:45:09 -0700 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: <51E8E994.3030400@uformia.com> References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> Message-ID: I don't think that's going to work. On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman wrote: > Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: > > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a > Windows machine to test myself. > > let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { > def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), > "# win32 fptoui", > [(X86WinFTOL RFP32:$src)]>, > Requires<[In32BitMode]>; > > def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), > "# win32 fptoui", > [(X86WinFTOL RFP64:$src)]>, > Requires<[In32BitMode]>; > } > > > On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman wrote: > >> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >> are the sqrt, I see the sqrtpd's in the output. Also there are three >> fptoui's and there are 3 call instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >> >> I think those calls correspond to this >> >> %110 = fptoui double %109 to i32 >> >> The calls are followed by an imul with 12 which matches up with what >> occurs right after the fptoui in the IR. >> >> >> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman wrote: >> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>> >>> Does this correspond to one of the .ll files you sent earlier? >>> >>> >>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman wrote: >>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been getting, both with >>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll try it with trunk and >>>> see if I get a different result there. Maybe there was a recent commit for >>>> this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>> >>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: >>>> >>>>> Is there something specifically required to enable SSE? If it's not >>>>> detected as available (based from the target triple?) then I don't think we >>>>> enable it specifically. >>>>> >>>>> Also it seems that it should handle converting to/from the vector >>>>> types, although I can see it getting confused about needing to do that if >>>>> it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>> >>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>> sqrt? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: >>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function as this is the only function >>>>>> called in the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>> circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>> >>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >>>>>> >>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>> >>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>> prefixed with "llvm.x86". >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>>>>>> >>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>> culprit. >>>>>>>> >>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>> previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>> >>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>> >>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>> >>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>> called with >>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>> [0x0] -11.339976634695301 double >>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>> at the time of the exception. >>>>>>>> >>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>> double asin( double ) ). >>>>>>>> >>>>>>>> Hopefully this is reproducible for you. >>>>>>>> >>>>>>>> -- >>>>>>>> PeterN >>>>>>>> >>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>>>>>>> >>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>> >>>>>>>>> Also further testing found me getting the same behavior with other >>>>>>>>> SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>> >>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>> optimization passes. >>>>>>>>> >>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>> >>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>> issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>> >>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>> >>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>> memory access. >>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is >>>>>>>>>>> part of the SSE error reporting process or is part of the situation causing >>>>>>>>>>> the error. >>>>>>>>>>> >>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Chisnall at cl.cam.ac.uk Fri Jul 19 01:07:16 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Fri, 19 Jul 2013 09:07:16 +0100 Subject: [LLVMdev] clang searching for many linux directories that do not exist on FreeBSD host In-Reply-To: References: <51E82E36.7020505@pix.net> Message-ID: <990E87DE-597C-492B-A14F-191C42EF53F8@cl.cam.ac.uk> On 19 Jul 2013, at 02:14, Eli Friedman wrote: > It's straightforward: you just need to make toolchains::FreeBSD > inherit directly from ToolChain and implement all the methods it would > otherwise inherit from Generic_ELF (which in turn inherits from > Generic_GCC). Wouldn't it make more sense to move the Linux-specific code out of Generic_GCC and into the Linux toolchain, rather than making all of the other subclasses of Generic_GCC reimplement the common code? David From dacian_herbei at yahoo.fr Fri Jul 19 01:20:27 2013 From: dacian_herbei at yahoo.fr (Herbei Dacian) Date: Fri, 19 Jul 2013 09:20:27 +0100 (BST) Subject: [LLVMdev] llva-emu In-Reply-To: References: Message-ID: <1374222027.3621.YahooMailNeo@web172606.mail.ir2.yahoo.com> Hi All, can anyone tell me where I can find the sources for the llva-emu project? I've tried to contact  Michael Brukman or Brian Gaeke but no reply. thank you for any help, dacian -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Fri Jul 19 02:34:30 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 19:34:30 +1000 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> Message-ID: <51E90826.3060201@uformia.com> That does appear to have worked. All my tests are passing now. I'll hand this out to our other devs & testers and make sure it's working for them as well (not just on my machine). Thank you, again. -- Peter N On 19/07/2013 5:45 PM, Craig Topper wrote: > I don't think that's going to work. > > > On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman > wrote: > > Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: >> Try adding ECX to the Defs of this part of >> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't >> have a Windows machine to test myself. >> >> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP32:$src)]>, >> Requires<[In32BitMode]>; >> >> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP64:$src)]>, >> Requires<[In32BitMode]>; >> } >> >> >> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman > > wrote: >> >> Oh, excellent point, I agree. My bad. Now that I'm not >> assuming those are the sqrt, I see the sqrtpd's in the >> output. Also there are three fptoui's and there are 3 call >> instances. >> >> (Changing subject line again.) >> >> Now it looks like it's bug #13862 >> >> On 19/07/2013 4:51 PM, Craig Topper wrote: >>> I think those calls correspond to this >>> >>> %110 = fptoui double %109 to i32 >>> >>> The calls are followed by an imul with 12 which matches up >>> with what occurs right after the fptoui in the IR. >>> >>> >>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman >>> > wrote: >>> >>> Yes, that is the result of module-dump.ll >>> >>> >>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>> Does this correspond to one of the .ll files you sent >>>> earlier? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>>> > wrote: >>>> >>>> (Changing subject line as diagnosis has changed) >>>> >>>> I'm attaching the compiled code that I've been >>>> getting, both with CodeGenOpt::Default and >>>> CodeGenOpt::None . The crash isn't occurring with >>>> CodeGenOpt::None, but that seems to be because ECX >>>> isn't being used - it still gets set to 0x7fffffff >>>> by one of the calls to 76719BA1 >>>> >>>> I notice that X86::SQRTPD[m|r] appear in >>>> X86InstrInfo::isHighLatencyDef. I was thinking an >>>> optimization might be removing it, but I don't get >>>> the sqrtpd instruction even if the createJIT >>>> optimization level turned off. >>>> >>>> I am trying this with the Release 3.3 code - I'll >>>> try it with trunk and see if I get a different >>>> result there. Maybe there was a recent commit for this. >>>> >>>> -- >>>> Peter N >>>> >>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>> Hmm, I'm not able to get those .ll files to >>>>> compile if I disable SSE and I end up with SSE >>>>> instructions(including sqrtpd) if I don't disable it. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman >>>>> > wrote: >>>>> >>>>> Is there something specifically required to >>>>> enable SSE? If it's not detected as available >>>>> (based from the target triple?) then I don't >>>>> think we enable it specifically. >>>>> >>>>> Also it seems that it should handle converting >>>>> to/from the vector types, although I can see >>>>> it getting confused about needing to do that >>>>> if it thinks SSE isn't available at all. >>>>> >>>>> >>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>> Hmm, maybe sse isn't being enabled so its >>>>>> falling back to emulating sqrt? >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter >>>>>> Newman >>>>> > wrote: >>>>>> >>>>>> In the disassembly, I'm seeing three cases of >>>>>> call 76719BA1 >>>>>> >>>>>> I am assuming this is the sqrt function >>>>>> as this is the only function called in >>>>>> the LLVM IR. >>>>>> >>>>>> The code at 76719BA1 is: >>>>>> >>>>>> 76719BA1 push ebp >>>>>> 76719BA2 mov ebp,esp >>>>>> 76719BA4 sub esp,20h >>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>> 76719BAA fld st(0) >>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>> 76719BC0 test eax,eax >>>>>> 76719BC2 je 76719DCF >>>>>> 76719BC8 fsubp st(1),st >>>>>> 76719BCA test edx,edx >>>>>> 76719BCC js 7671F9DB >>>>>> 76719BD2 fstp dword ptr [esp] >>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>> 76719BDE sbb eax,0 >>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>> 76719BE5 sbb edx,0 >>>>>> 76719BE8 leave >>>>>> 76719BE9 ret >>>>>> >>>>>> >>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>> >>>>>> I don't know that this is the sqrtpd >>>>>> function (for example, I'm not seeing any >>>>>> SSE instructions here?) but whatever it >>>>>> is, it's being called from the IR I >>>>>> attached earlier, and is modifying ECX >>>>>> under some circumstances. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>> That should map directly to sqrtpd which >>>>>>> can't modify ecx. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter >>>>>>> Newman >>>>>> > wrote: >>>>>>> >>>>>>> Sorry, that should have been >>>>>>> llvm.x86.sse2.sqrt.pd >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:25 PM, Craig Topper >>>>>>> wrote: >>>>>>>> What is "frep.x86.sse2.sqrt.pd". >>>>>>>> I'm only familiar with things >>>>>>>> prefixed with "llvm.x86". >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, >>>>>>>> Peter Newman >>>>>>> > wrote: >>>>>>>> >>>>>>>> After stepping through the >>>>>>>> produced assembly, I believe I >>>>>>>> have a culprit. >>>>>>>> >>>>>>>> One of the calls to >>>>>>>> @frep.x86.sse2.sqrt.pd is >>>>>>>> modifying the value of ECX - >>>>>>>> while the produced code is >>>>>>>> expecting it to still contain >>>>>>>> its previous value. >>>>>>>> >>>>>>>> Peter N >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 2:09 PM, Peter >>>>>>>> Newman wrote: >>>>>>>>> I've attached the >>>>>>>>> module->dump() that our code >>>>>>>>> is producing. Unfortunately >>>>>>>>> this is the smallest test case >>>>>>>>> I have available. >>>>>>>>> >>>>>>>>> This is before any >>>>>>>>> optimization passes are >>>>>>>>> applied. There are two >>>>>>>>> separate modules in existence >>>>>>>>> at the time, and there are no >>>>>>>>> guarantees about the order the >>>>>>>>> surrounding code calls those >>>>>>>>> functions, so there may be >>>>>>>>> some interaction between them? >>>>>>>>> There shouldn't be, they don't >>>>>>>>> refer to any common memory >>>>>>>>> etc. There is no >>>>>>>>> multi-threading occurring. >>>>>>>>> >>>>>>>>> The function in module-dump.ll >>>>>>>>> (called crashfunc in this >>>>>>>>> file) is called with >>>>>>>>> - func_params 0x0018f3b0 >>>>>>>>> double [3] >>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>> at the time of the exception. >>>>>>>>> >>>>>>>>> This is compiled on a >>>>>>>>> "i686-pc-win32" triple. All of >>>>>>>>> the non-intrinsic functions >>>>>>>>> referred to in these modules >>>>>>>>> are the standard equivalents >>>>>>>>> from the MSVC library (e.g. >>>>>>>>> @asin is the standard C lib >>>>>>>>> double asin( double ) ). >>>>>>>>> >>>>>>>>> Hopefully this is reproducible >>>>>>>>> for you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> PeterN >>>>>>>>> >>>>>>>>> On 18/07/2013 4:37 PM, Craig >>>>>>>>> Topper wrote: >>>>>>>>>> Are you able to send any IR >>>>>>>>>> for others to reproduce this >>>>>>>>>> issue? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2013 at 11:23 >>>>>>>>>> PM, Peter Newman >>>>>>>>>> >>>>>>>>> > >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Unfortunately, this >>>>>>>>>> doesn't appear to be the >>>>>>>>>> bug I'm hitting. I >>>>>>>>>> applied the fix to my >>>>>>>>>> source and it didn't make >>>>>>>>>> a difference. >>>>>>>>>> >>>>>>>>>> Also further testing >>>>>>>>>> found me getting the same >>>>>>>>>> behavior with other SIMD >>>>>>>>>> instructions. The common >>>>>>>>>> factor is in each case, >>>>>>>>>> ECX is set to 0x7fffffff, >>>>>>>>>> and it's an operation >>>>>>>>>> using xmm ptr ecx+offset . >>>>>>>>>> >>>>>>>>>> Additionally, turning the >>>>>>>>>> optimization level passed >>>>>>>>>> to createJIT down appears >>>>>>>>>> to avoid it, so I'm now >>>>>>>>>> leaning towards a bug in >>>>>>>>>> one of the optimization >>>>>>>>>> passes. >>>>>>>>>> >>>>>>>>>> I'm going to dig through >>>>>>>>>> the passes controlled by >>>>>>>>>> that parameter and see if >>>>>>>>>> I can narrow down which >>>>>>>>>> optimization is causing it. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/07/2013 1:58 PM, >>>>>>>>>> Solomon Boulos wrote: >>>>>>>>>> >>>>>>>>>> As someone off list >>>>>>>>>> just told me, perhaps >>>>>>>>>> my new bug is the >>>>>>>>>> same issue: >>>>>>>>>> >>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>> >>>>>>>>>> Do you happen to be >>>>>>>>>> using FastISel? >>>>>>>>>> >>>>>>>>>> Solomon >>>>>>>>>> >>>>>>>>>> On Jul 16, 2013, at >>>>>>>>>> 6:39 PM, Peter Newman >>>>>>>>>> >>>>>>>>> > >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I'm currently in >>>>>>>>>> the process of >>>>>>>>>> debugging a crash >>>>>>>>>> occurring in our >>>>>>>>>> program. In LLVM >>>>>>>>>> 3.2 and 3.3 it >>>>>>>>>> appears that JIT >>>>>>>>>> generated code is >>>>>>>>>> attempting to >>>>>>>>>> perform access >>>>>>>>>> unaligned memory >>>>>>>>>> with a SSE2 >>>>>>>>>> instruction. >>>>>>>>>> However this only >>>>>>>>>> happens under >>>>>>>>>> certain >>>>>>>>>> conditions that >>>>>>>>>> seem (but may not >>>>>>>>>> be) related to >>>>>>>>>> the stacks state >>>>>>>>>> on calling the >>>>>>>>>> function. >>>>>>>>>> >>>>>>>>>> Our program acts >>>>>>>>>> as a front-end, >>>>>>>>>> using the LLVM >>>>>>>>>> C++ API to >>>>>>>>>> generate a JIT >>>>>>>>>> generated >>>>>>>>>> function. This >>>>>>>>>> function is >>>>>>>>>> primarily >>>>>>>>>> mathematical, so >>>>>>>>>> we use the Vector >>>>>>>>>> types to take >>>>>>>>>> advantage of SIMD >>>>>>>>>> instructions (as >>>>>>>>>> well as a few >>>>>>>>>> SSE2 intrinsics). >>>>>>>>>> >>>>>>>>>> This worked in >>>>>>>>>> LLVM 2.8 but >>>>>>>>>> started failing >>>>>>>>>> in 3.2 and has >>>>>>>>>> continued to fail >>>>>>>>>> in 3.3. It fails >>>>>>>>>> with no >>>>>>>>>> optimizations >>>>>>>>>> applied to the >>>>>>>>>> LLVM >>>>>>>>>> Function/Module. >>>>>>>>>> It crashes with >>>>>>>>>> what is reported >>>>>>>>>> as a memory >>>>>>>>>> access error >>>>>>>>>> (accessing >>>>>>>>>> 0xffffffff), >>>>>>>>>> however it's >>>>>>>>>> suggested that >>>>>>>>>> this is how the >>>>>>>>>> SSE fault raising >>>>>>>>>> mechanism appears. >>>>>>>>>> >>>>>>>>>> The generated >>>>>>>>>> instruction >>>>>>>>>> varies, but it >>>>>>>>>> seems to often be >>>>>>>>>> similar to (I >>>>>>>>>> don't have it in >>>>>>>>>> front of me, sorry): >>>>>>>>>> movapd xmm0, >>>>>>>>>> xmm[ecx+0x???????] >>>>>>>>>> Where the xmm >>>>>>>>>> register changes, >>>>>>>>>> and the second >>>>>>>>>> parameter is a >>>>>>>>>> memory access. >>>>>>>>>> ECX is always set >>>>>>>>>> to 0x7ffffff - >>>>>>>>>> however I don't >>>>>>>>>> know if this is >>>>>>>>>> part of the SSE >>>>>>>>>> error reporting >>>>>>>>>> process or is >>>>>>>>>> part of the >>>>>>>>>> situation causing >>>>>>>>>> the error. >>>>>>>>>> >>>>>>>>>> I haven't worked >>>>>>>>>> out exactly what >>>>>>>>>> code path etc is >>>>>>>>>> causing this >>>>>>>>>> crash. I'm hoping >>>>>>>>>> that someone can >>>>>>>>>> tell me if there >>>>>>>>>> were any changed >>>>>>>>>> requirements for >>>>>>>>>> working with SIMD >>>>>>>>>> in LLVM 3.2 (or >>>>>>>>>> earlier, we >>>>>>>>>> haven't tried 3.0 >>>>>>>>>> or 3.1). I >>>>>>>>>> currently suspect >>>>>>>>>> the use of >>>>>>>>>> GlobalVariable >>>>>>>>>> (we first >>>>>>>>>> discovered the >>>>>>>>>> crash when using >>>>>>>>>> a feature that >>>>>>>>>> uses them), >>>>>>>>>> however I have >>>>>>>>>> attempted using >>>>>>>>>> setAlignment on >>>>>>>>>> the >>>>>>>>>> GlobalVariables >>>>>>>>>> without any change. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Peter N >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers >>>>>>>>>> mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>> >>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>> >>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~Craig >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.tweed at arm.com Fri Jul 19 02:43:10 2013 From: david.tweed at arm.com (David Tweed) Date: Fri, 19 Jul 2013 10:43:10 +0100 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <41E2E7A4-4508-4014-9FE7-54FD7320BC73@grame.fr> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> <37D17E8A-300B-40EC-B96F-B12044C28D3A@grame.fr> <41E2E7A4-4508-4014-9FE7-54FD7320BC73@grame.fr> Message-ID: <000001ce8464$68caa4b0$3a5fee10$@tweed@arm.com> Hi, | And since the 1) DSL ==> C/C++ ===> clang/gcc -03 ===> exec code chain has the "correct" speed, there is no reason the JIT based one should be slower right? | So I still guess something is wrong in the way we use the JIT and/or some LTO issue possibly? When you say "slower" wrt 3.1 on LLVM and the same speed for clang, could you put some rough time numbers on things for some fixed testcode for your DSL? Obviously they won't have an absolute meaning, but the order of magnitude relative to the normal execution times might guide the ideas about what it could be. Cheers, Dave From kumarsukhani at gmail.com Fri Jul 19 05:36:03 2013 From: kumarsukhani at gmail.com (Kumar Sukhani) Date: Fri, 19 Jul 2013 18:06:03 +0530 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir Message-ID: To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps giving here [1]. but when I run ./configure I am getting following error- root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >> -with-llvm-config-path=../llvm-3.3.src/configure >> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath > > checking build system type... x86_64-unknown-linux-gnu > > checking host system type... x86_64-unknown-linux-gnu > > checking target system type... x86_64-unknown-linux-gnu > > checking type of operating system we're going to host on... Linux > > configure: error: missing argument to --bindir > > configure: error: Cannot find (or not executable) > > I tried searching it online but didn't got any similar issue. [1] http://vmkit.llvm.org/get_started.html -- Kumar Sukhani +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From h.bakiras at gmail.com Fri Jul 19 06:22:31 2013 From: h.bakiras at gmail.com (Harris BAKIRAS) Date: Fri, 19 Jul 2013 15:22:31 +0200 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: References: Message-ID: <51E93D97.10103@gmail.com> Hi Kumar, There is an error on your configuration line, you should provide the path to llvm-config binary instead of configure file. Assuming that you compiled llvm in release mode, the llvm-config binary is located in : YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config Try to change the -with-llvm-config-path option and it will compile. Harris Bakiras On 07/19/2013 02:36 PM, Kumar Sukhani wrote: > To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps > giving here [1]. > but when I run ./configure I am getting following error- > > root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure > -with-llvm-config-path=../llvm-3.3.src/configure > --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip > --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath > > checking build system type... x86_64-unknown-linux-gnu > > checking host system type... x86_64-unknown-linux-gnu > > checking target system type... x86_64-unknown-linux-gnu > > checking type of operating system we're going to host on... Linux > > configure: error: missing argument to --bindir > > configure: error: Cannot find (or not executable) > > > I tried searching it online but didn't got any similar issue. > > [1] http://vmkit.llvm.org/get_started.html > > -- > Kumar Sukhani > +919579650250 > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnk at google.com Fri Jul 19 06:34:15 2013 From: rnk at google.com (Reid Kleckner) Date: Fri, 19 Jul 2013 09:34:15 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 9:48 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > > Hi Rafael, > > > > Did this discussion ever get a conclusion? I support enabling > > pipefail. Fallout for out of tree users should be easy to fix. As we > > learned from LLVM tests, almost all tests that start to fail actually > > indicate a real problem that was hidden. > > So far I got some positive feedback, but no strong LGTM from someone > in the area :-( > +1 more for pipefail, if that helps. :) The only standing objection has to do with out-of-tree target maintainers, and honestly I think they'll be fine. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kumarsukhani at gmail.com Fri Jul 19 06:47:10 2013 From: kumarsukhani at gmail.com (Kumar Sukhani) Date: Fri, 19 Jul 2013 19:17:10 +0530 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: <51E93D97.10103@gmail.com> References: <51E93D97.10103@gmail.com> Message-ID: Hi Harris Bakiras, Thanks for reply. It working now. Actually I wanted to try vmkit VM to run jruby codes. vmkit is able to run Java program, but when I try to run JRuby code then I get following error - root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb > > Platform.java:39:in `getPackageName': java.lang.NullPointerException > > from ConstantSet.java:84:in `getEnumClass' > > from ConstantSet.java:60:in `getConstantSet' > > from ConstantResolver.java:181:in `getConstants' > > from ConstantResolver.java:102:in `getConstant' > > from ConstantResolver.java:146:in `intValue' > > from OpenFlags.java:28:in `value' > > from RubyFile.java:254:in `createFileClass' > > from Ruby.java:1273:in `initCore' > > from Ruby.java:1101:in `bootstrap' > > from Ruby.java:1079:in `init' > > from Ruby.java:179:in `newInstance' > > from Main.java:217:in `run' > > from Main.java:128:in `run' > > from Main.java:97:in `main' > > Can you tell me what will be the issue ? Vmkit doesn't work with OpenJDK ? On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS wrote: > Hi Kumar, > > There is an error on your configuration line, you should provide the path > to llvm-config binary instead of configure file. > Assuming that you compiled llvm in release mode, the llvm-config binary is > located in : > > YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config > > Try to change the -with-llvm-config-path option and it will compile. > > Harris Bakiras > > On 07/19/2013 02:36 PM, Kumar Sukhani wrote: > > To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps > giving here [1]. > but when I run ./configure I am getting following error- > > root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >>> -with-llvm-config-path=../llvm-3.3.src/configure >>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >> >> checking build system type... x86_64-unknown-linux-gnu >> >> checking host system type... x86_64-unknown-linux-gnu >> >> checking target system type... x86_64-unknown-linux-gnu >> >> checking type of operating system we're going to host on... Linux >> >> configure: error: missing argument to --bindir >> >> configure: error: Cannot find (or not executable) >> >> > I tried searching it online but didn't got any similar issue. > > [1] http://vmkit.llvm.org/get_started.html > > -- > Kumar Sukhani > +919579650250 > > > _______________________________________________ > LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- Kumar Sukhani +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From criswell at illinois.edu Fri Jul 19 07:03:11 2013 From: criswell at illinois.edu (John Criswell) Date: Fri, 19 Jul 2013 09:03:11 -0500 Subject: [LLVMdev] llva-emu In-Reply-To: <1374222027.3621.YahooMailNeo@web172606.mail.ir2.yahoo.com> References: <1374222027.3621.YahooMailNeo@web172606.mail.ir2.yahoo.com> Message-ID: <51E9471F.1060407@illinois.edu> On 7/19/13 3:20 AM, Herbei Dacian wrote: > > > Hi All, > can anyone tell me where I can find the sources for the llva-emu project? The llva-emu code is extremely old and, as I recall, not very feature-filled. It was also done for a class project (I don't think it was used for the original LLVA publication, and it wasn't used for any of the subsequent LLVA/SVA publications on which I worked). For what purpose did you need the code? -- John T. > I've tried to contact Michael Brukman or > Brian Gaeke but no reply. > thank you for any help, > dacian > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Fri Jul 19 07:18:01 2013 From: letz at grame.fr (=?windows-1252?Q?St=E9phane_Letz?=) Date: Fri, 19 Jul 2013 16:18:01 +0200 Subject: [LLVMdev] LLVM 3.3 JIT code speed In-Reply-To: <000001ce8464$68caa4b0$3a5fee10$@tweed@arm.com> References: <6E47DFE5-C272-4AF5-B348-1CCB0ABA5561@grame.fr> <9E89BCB4-5B8F-438C-8716-2CE0EAA3757B@grame.fr> <0983E6C011D2DC4188F8761B533492DE564019C0@ORSMSX104.amr.corp.intel.com> <37D17E8A-300B-40EC-B96F-B12044C28D3A@grame.fr> <41E2E7A4-4508-4014-9FE7-54FD7320BC73@grame.fr> <000001ce8464$68caa4b0$3a5fee10$@tweed@arm.com> Message-ID: <0508B852-05EB-402A-A105-588F0CC2E682@grame.fr> Le 19 juil. 2013 à 11:43, David Tweed a écrit : > Hi, > > | And since the 1) DSL ==> C/C++ ===> clang/gcc -03 ===> exec code chain > has the "correct" speed, there is no reason the JIT based one should be > slower right? > > | So I still guess something is wrong in the way we use the JIT and/or some > LTO issue possibly? > > When you say "slower" wrt 3.1 on LLVM and the same speed for clang, could > you put some rough time numbers on things for some fixed testcode for your > DSL? Obviously they won't have an absolute meaning, but the order of > magnitude relative to the normal execution times might guide the ideas about > what it could be. > > Cheers, > Dave > About 20% slower with LLVM JIT 3.3 compared to clang 3.3, clang 3.1 and LLVM JIT 3.1 Stéphane From h.bakiras at gmail.com Fri Jul 19 07:31:46 2013 From: h.bakiras at gmail.com (Harris BAKIRAS) Date: Fri, 19 Jul 2013 16:31:46 +0200 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: References: <51E93D97.10103@gmail.com> Message-ID: <51E94DD2.4060802@gmail.com> I don't know how JRuby works, maybe it uses some new feature that GNU Classpath does not provide. VMKit's openJDK version is unstable on 64 bits since package version 6b27. You can still use it for very small programs which does not need GC but that's all. It works fine on 32 bits. So you can try it on 32 bits or revert your java version to a previous one (< than 6b27) to test it on 64 bits. We are working on fixing the 64 bits issue as soon as possible. Harris Bakiras On 07/19/2013 03:47 PM, Kumar Sukhani wrote: > Hi Harris Bakiras, > Thanks for reply. It working now. > Actually I wanted to try vmkit VM to run jruby codes. > > vmkit is able to run Java program, but when I try to run JRuby code > then I get following error - > > root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb > > Platform.java:39:in `getPackageName': > java.lang.NullPointerException > > from ConstantSet.java:84:in `getEnumClass' > > from ConstantSet.java:60:in `getConstantSet' > > from ConstantResolver.java:181:in `getConstants' > > from ConstantResolver.java:102:in `getConstant' > > from ConstantResolver.java:146:in `intValue' > > from OpenFlags.java:28:in `value' > > from RubyFile.java:254:in `createFileClass' > > from Ruby.java:1273:in `initCore' > > from Ruby.java:1101:in `bootstrap' > > from Ruby.java:1079:in `init' > > from Ruby.java:179:in `newInstance' > > from Main.java:217:in `run' > > from Main.java:128:in `run' > > from Main.java:97:in `main' > > > Can you tell me what will be the issue ? > Vmkit doesn't work with OpenJDK ? > > On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS > wrote: > > Hi Kumar, > > There is an error on your configuration line, you should provide > the path to llvm-config binary instead of configure file. > Assuming that you compiled llvm in release mode, the llvm-config > binary is located in : > > YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config > > Try to change the -with-llvm-config-path option and it will compile. > > Harris Bakiras > > On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the >> steps giving here [1]. >> but when I run ./configure I am getting following error- >> >> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# >> ./configure >> -with-llvm-config-path=../llvm-3.3.src/configure >> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >> >> checking build system type... x86_64-unknown-linux-gnu >> >> checking host system type... x86_64-unknown-linux-gnu >> >> checking target system type... x86_64-unknown-linux-gnu >> >> checking type of operating system we're going to host >> on... Linux >> >> configure: error: missing argument to --bindir >> >> configure: error: Cannot find (or not executable) >> >> >> I tried searching it online but didn't got any similar issue. >> >> [1] http://vmkit.llvm.org/get_started.html >> >> -- >> Kumar Sukhani >> +919579650250 >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > Kumar Sukhani > +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dacian_herbei at yahoo.fr Fri Jul 19 07:36:07 2013 From: dacian_herbei at yahoo.fr (Herbei Dacian) Date: Fri, 19 Jul 2013 15:36:07 +0100 (BST) Subject: [LLVMdev] llva-emu In-Reply-To: <51E9471F.1060407@illinois.edu> References: <1374222027.3621.YahooMailNeo@web172606.mail.ir2.yahoo.com> <51E9471F.1060407@illinois.edu> Message-ID: <1374244567.28856.YahooMailNeo@web172602.mail.ir2.yahoo.com> Hi John, Thank you for response. I would like to develop it further. I want to have something like singularity developed by ms. I will removed most of the drivers in the beginning and run it on a very simple embedded system. That is one part. The other part is that I would like to get measurements of how memory gets allocated/accessed and other statistical values about the runtime. best regards, dacian ________________________________ From: John Criswell To: Herbei Dacian Cc: "llvmdev at cs.uiuc.edu" Sent: Friday, 19 July 2013, 16:03 Subject: Re: [LLVMdev] llva-emu On 7/19/13 3:20 AM, Herbei Dacian wrote: > > >Hi All, >can anyone tell me where I can find the sources for the llva-emu project? > The llva-emu code is extremely old and, as I recall, not very feature-filled.  It was also done for a class project (I don't think it was used for the original LLVA publication, and it wasn't used for any of the subsequent LLVA/SVA publications on which I worked). For what purpose did you need the code? -- John T. I've tried to contact  Michael Brukman or Brian Gaeke but no reply. >thank you for any help, >dacian > > > >_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Fri Jul 19 08:05:41 2013 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 19 Jul 2013 08:05:41 -0700 Subject: [LLVMdev] Proposal: function prefix data In-Reply-To: References: <20130718010609.GA17472@pcc.me.uk> Message-ID: On Jul 18, 2013, at 12:23 PM, Tyler Hardin wrote: > As much as I like this idea for it's use in languages with type systems like Haskell and Scheme, this proposal would limit LLVM to non-Harvard architectures. That's generally a really small minority of all processors, but it would mean there could never be a clang-avr. > LLVM already effectively assumes a unified address space. The pointer address space attributes and such come close to what would be required for good Harvard support, but it’s not enough. In particular there’s the fundamental assumption that all pointers are the same size. For embedded Harvard arches, that’s often not the case. -JIm > An alternative you could use is, instead of using the function pointer as the variable where you are referring to a function, you could have the variable be a pointer to a static struct with the data and the actual function pointer. Basically, it's like how static class variables as handled in C++. > > I don't know LLVM IR, so I'll use C to explain. > > Instead of this: > > void func(void){} > > int main(){ > func(); > return 0; > } > > You could do this: > > void func(void){} > > /* You have to initialize this at compile time. */ > struct { > char* data; > int len; > void (*ptr)(void) = func; > } func_data; > > int main(){ > func_data.ptr(); > return 0; > } > > On Jul 18, 2013 12:47 PM, "Jevin Sweval" wrote: > On Wed, Jul 17, 2013 at 9:06 PM, Peter Collingbourne wrote: > > > > To maintain the semantics of ordinary function calls, the prefix data > > must have a particular format. Specifically, it must begin with a > > sequence of bytes which decode to a sequence of machine instructions, > > valid for the module's target, which transfer control to the point > > immediately succeeding the prefix data, without performing any other > > visible action. This allows the inliner and other passes to reason > > about the semantics of the function definition without needing to > > reason about the prefix data. Obviously this makes the format of the > > prefix data highly target dependent. > > > What if the prefix data was stored before the start of the function > code? The function's symbol will point to the code just as before, > eliminating the need to have instructions that skip the prefix data. > > It would look something like: > | Prefix Data ... (variable length) | Prefix Data Length (fixed length > [32 bits?]) | Function code .... | > > ^ function symbol points here (function code) > > I hope the simple ASCII art makes it through my mail client. > > To access the data, you do > > prefix_data = function_ptr - sizeof(prefix_length) - prefix_length > > Cheers, > Jevin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.espindola at gmail.com Fri Jul 19 08:42:57 2013 From: rafael.espindola at gmail.com (Rafael Avila de Espindola) Date: Fri, 19 Jul 2013 11:42:57 -0400 Subject: [LLVMdev] clang searching for many linux directories that do not exist on FreeBSD host In-Reply-To: <990E87DE-597C-492B-A14F-191C42EF53F8@cl.cam.ac.uk> References: <51E82E36.7020505@pix.net> <990E87DE-597C-492B-A14F-191C42EF53F8@cl.cam.ac.uk> Message-ID: <47A57AF0-487A-431F-AD2A-7CB2BDCA2030@gmail.com> Probably. The gcc class should know the structure of a gcc installation, but the Linux paths should be in another class. Sent from my iPhone On 2013-07-19, at 4:07, David Chisnall wrote: > On 19 Jul 2013, at 02:14, Eli Friedman wrote: > >> It's straightforward: you just need to make toolchains::FreeBSD >> inherit directly from ToolChain and implement all the methods it would >> otherwise inherit from Generic_ELF (which in turn inherits from >> Generic_GCC). > > Wouldn't it make more sense to move the Linux-specific code out of Generic_GCC and into the Linux toolchain, rather than making all of the other subclasses of Generic_GCC reimplement the common code? > > David > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From kumarsukhani at gmail.com Fri Jul 19 08:50:40 2013 From: kumarsukhani at gmail.com (Kumar Sukhani) Date: Fri, 19 Jul 2013 21:20:40 +0530 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: <51E94DD2.4060802@gmail.com> References: <51E93D97.10103@gmail.com> <51E94DD2.4060802@gmail.com> Message-ID: I am working on a project to port JRuby on Embedded systems. JRuby converts Ruby code to bytecode which is executed by any JVM. For this project I am testing performance of JRuby with various available JVMs. I have chosen ARM architecture. Does vmkit support ARM architecture? On Fri, Jul 19, 2013 at 8:01 PM, Harris BAKIRAS wrote: > I don't know how JRuby works, maybe it uses some new feature that GNU > Classpath does not provide. > > VMKit's openJDK version is unstable on 64 bits since package version 6b27. > You can still use it for very small programs which does not need GC but > that's all. > > It works fine on 32 bits. > So you can try it on 32 bits or revert your java version to a previous one > (< than 6b27) to test it on 64 bits. > > We are working on fixing the 64 bits issue as soon as possible. > > Harris Bakiras > > On 07/19/2013 03:47 PM, Kumar Sukhani wrote: > > Hi Harris Bakiras, > Thanks for reply. It working now. > Actually I wanted to try vmkit VM to run jruby codes. > > vmkit is able to run Java program, but when I try to run JRuby code then > I get following error - > > root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb >> >> Platform.java:39:in `getPackageName': java.lang.NullPointerException >> >> from ConstantSet.java:84:in `getEnumClass' >> >> from ConstantSet.java:60:in `getConstantSet' >> >> from ConstantResolver.java:181:in `getConstants' >> >> from ConstantResolver.java:102:in `getConstant' >> >> from ConstantResolver.java:146:in `intValue' >> >> from OpenFlags.java:28:in `value' >> >> from RubyFile.java:254:in `createFileClass' >> >> from Ruby.java:1273:in `initCore' >> >> from Ruby.java:1101:in `bootstrap' >> >> from Ruby.java:1079:in `init' >> >> from Ruby.java:179:in `newInstance' >> >> from Main.java:217:in `run' >> >> from Main.java:128:in `run' >> >> from Main.java:97:in `main' >> >> > Can you tell me what will be the issue ? > Vmkit doesn't work with OpenJDK ? > > On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS wrote: > >> Hi Kumar, >> >> There is an error on your configuration line, you should provide the path >> to llvm-config binary instead of configure file. >> Assuming that you compiled llvm in release mode, the llvm-config binary >> is located in : >> >> YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config >> >> Try to change the -with-llvm-config-path option and it will compile. >> >> Harris Bakiras >> >> On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >> >> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps >> giving here [1]. >> but when I run ./configure I am getting following error- >> >> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >>>> -with-llvm-config-path=../llvm-3.3.src/configure >>>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >>> >>> checking build system type... x86_64-unknown-linux-gnu >>> >>> checking host system type... x86_64-unknown-linux-gnu >>> >>> checking target system type... x86_64-unknown-linux-gnu >>> >>> checking type of operating system we're going to host on... Linux >>> >>> configure: error: missing argument to --bindir >>> >>> configure: error: Cannot find (or not executable) >>> >>> >> I tried searching it online but didn't got any similar issue. >> >> [1] http://vmkit.llvm.org/get_started.html >> >> -- >> Kumar Sukhani >> +919579650250 >> >> >> _______________________________________________ >> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > -- > Kumar Sukhani > +919579650250 > > > -- Kumar Sukhani +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Fri Jul 19 09:46:48 2013 From: dblaikie at gmail.com (David Blaikie) Date: Fri, 19 Jul 2013 09:46:48 -0700 Subject: [LLVMdev] About LLVM switch instruction In-Reply-To: References: <71D0FF69-9F1C-47EF-A197-38059D9F2F1D@apple.com> Message-ID: On Wed, Jul 17, 2013 at 11:58 PM, Hongbin Zheng wrote: > Hi Milind, > > My suggestion just for your concern that if you eliminate the default block, > a block associated with a case value will become the default block of the > swhich instruction, since a switch instruction always requires a default > block. > But when a block associated with a case value become the default block, the > associated case value is lost and may confuse the later optimizations such > as constant propagation. > > To prevent such information lost when you eliminate the default block and > make a block associated with a case value will become the default block, you > can attach a metadata[1] to the switch instruction to provide the case value > of the default block. If I'm understanding you correctly you're suggesting removing the default block & representing it in metadata only (instead of using an existing case block as the default as well). That seems like it would break the invariant that every switch has a default (if we didn't have that invariant we would just eliminate the unreachable default block & not bother associating it with an existing case), wouldn't it? Metadata has to be droppable (because not all passes preserve/understand all metadata) so we can't rely on the preservation of the metadata to maintain the switch instructions invariant. > > In order to take the advantage of the attached metadata for the default case > of the switch instruction you also need to modify the later optimization > accordingly. > > Thanks > Hongbin > > [1]http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html > > > > On Thu, Jul 18, 2013 at 2:30 PM, Milind Chabbi > wrote: >> >> Hongbin >> >> Can you elaborate more on your suggestion? I am not sure I fully >> understand what you suggested. >> >> -Milind >> >> On Wed, Jul 17, 2013 at 11:11 PM, Hongbin Zheng >> wrote: >> > Hi Milind, >> > >> > Maybe you could annotate the default case value as metadata to the swith >> > instruction. >> > >> > Thanks >> > Hongbin >> > >> > >> > On Thu, Jul 18, 2013 at 1:09 PM, Milind Chabbi >> > wrote: >> >> >> >> Hi Mark, >> >> >> >> This will workaround the problem of "default" branch restriction on >> >> the switch instruction. The trouble with this technique is that it >> >> will trump later optimization phases such as constant propagation. >> >> When a block was part of a case, because of the knowledge of the case >> >> value, the block was a candidate for better optimization. However, >> >> when we move the body of the case into the default, the knowledge of >> >> the case value is lost and the body is less optimizable. >> >> >> >> -Milind >> >> >> >> >> >> On Wed, Jul 17, 2013 at 9:29 PM, Mark Lacey >> >> wrote: >> >> > On Jul 17, 2013, at 9:01 PM, Milind Chabbi >> >> > wrote: >> >> >> I am performing a transformation that requires changing the targets >> >> >> of >> >> >> a basic block ending with a switch instruction. >> >> >> In particular, I need to delete the edge that goes to the "default" >> >> >> basic block. >> >> >> But, LLVM switch instruction always wants a default target basic >> >> >> block >> >> >> for a switch instruction. >> >> >> It is not clear how to accomplish this, since I don't have a >> >> >> replacement default target block. >> >> >> I could potentially fake that edge to be one of the other case label >> >> >> targets, but that is an ugly hack and I don't want to do that. >> >> >> I would appreciate if you can suggest better alternatives. >> >> > >> >> > Hi Milind, >> >> > >> >> > If you make the "default" branch to a block that has an >> >> > UnreachableInst >> >> > as a terminator, the SimplifyCFG pass will remove one of the switch >> >> > cases >> >> > and replace the block that the default branches to with the block >> >> > that this >> >> > removed case branches to. This sounds a lot like the "ugly hack" that >> >> > you >> >> > would like to avoid. Would it be a reasonable solution for what you >> >> > are >> >> > trying to accomplish? >> >> > >> >> > Mark >> >> > >> >> > >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From dblaikie at gmail.com Fri Jul 19 10:03:17 2013 From: dblaikie at gmail.com (David Blaikie) Date: Fri, 19 Jul 2013 10:03:17 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On Fri, Jul 19, 2013 at 6:34 AM, Reid Kleckner wrote: > On Wed, Jul 17, 2013 at 9:48 PM, Rafael Espíndola > wrote: >> >> > Hi Rafael, >> > >> > Did this discussion ever get a conclusion? I support enabling >> > pipefail. Fallout for out of tree users should be easy to fix. As we >> > learned from LLVM tests, almost all tests that start to fail actually >> > indicate a real problem that was hidden. >> >> So far I got some positive feedback, but no strong LGTM from someone >> in the area :-( > > > +1 more for pipefail, if that helps. :) Another +1. > The only standing objection has to do with out-of-tree target maintainers, > and honestly I think they'll be fine. I'm sure they will be. This is not the worst thing they have to live with (the API churn is massive) & that's a tradeoff we/they deliberately make for this project: trunk moves forward. From peter at uformia.com Thu Jul 18 23:34:56 2013 From: peter at uformia.com (Peter Newman) Date: Fri, 19 Jul 2013 16:34:56 +1000 Subject: [LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <7CA02EA7-A9F2-4A91-A10B-24ED35111F3F@cs.stanford.edu> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> Message-ID: <51E8DE10.9090900@uformia.com> (Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be removing it, but I don't get the sqrtpd instruction even if the createJIT optimization level turned off. I am trying this with the Release 3.3 code - I'll try it with trunk and see if I get a different result there. Maybe there was a recent commit for this. -- Peter N On 19/07/2013 4:00 PM, Craig Topper wrote: > Hmm, I'm not able to get those .ll files to compile if I disable SSE > and I end up with SSE instructions(including sqrtpd) if I don't > disable it. > > > On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman > wrote: > > Is there something specifically required to enable SSE? If it's > not detected as available (based from the target triple?) then I > don't think we enable it specifically. > > Also it seems that it should handle converting to/from the vector > types, although I can see it getting confused about needing to do > that if it thinks SSE isn't available at all. > > > On 19/07/2013 3:47 PM, Craig Topper wrote: >> Hmm, maybe sse isn't being enabled so its falling back to >> emulating sqrt? >> >> >> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman > > wrote: >> >> In the disassembly, I'm seeing three cases of >> call 76719BA1 >> >> I am assuming this is the sqrt function as this is the only >> function called in the LLVM IR. >> >> The code at 76719BA1 is: >> >> 76719BA1 push ebp >> 76719BA2 mov ebp,esp >> 76719BA4 sub esp,20h >> 76719BA7 and esp,0FFFFFFF0h >> 76719BAA fld st(0) >> 76719BAC fst dword ptr [esp+18h] >> 76719BB0 fistp qword ptr [esp+10h] >> 76719BB4 fild qword ptr [esp+10h] >> 76719BB8 mov edx,dword ptr [esp+18h] >> 76719BBC mov eax,dword ptr [esp+10h] >> 76719BC0 test eax,eax >> 76719BC2 je 76719DCF >> 76719BC8 fsubp st(1),st >> 76719BCA test edx,edx >> 76719BCC js 7671F9DB >> 76719BD2 fstp dword ptr [esp] >> 76719BD5 mov ecx,dword ptr [esp] >> 76719BD8 add ecx,7FFFFFFFh >> 76719BDE sbb eax,0 >> 76719BE1 mov edx,dword ptr [esp+14h] >> 76719BE5 sbb edx,0 >> 76719BE8 leave >> 76719BE9 ret >> >> >> As you can see at 76719BD5, it modifies ECX . >> >> I don't know that this is the sqrtpd function (for example, >> I'm not seeing any SSE instructions here?) but whatever it >> is, it's being called from the IR I attached earlier, and is >> modifying ECX under some circumstances. >> >> >> On 19/07/2013 3:29 PM, Craig Topper wrote: >>> That should map directly to sqrtpd which can't modify ecx. >>> >>> >>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman >>> > wrote: >>> >>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>> >>> >>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with >>>> things prefixed with "llvm.x86". >>>> >>>> >>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman >>>> > wrote: >>>> >>>> After stepping through the produced assembly, I >>>> believe I have a culprit. >>>> >>>> One of the calls to @frep.x86.sse2.sqrt.pd is >>>> modifying the value of ECX - while the produced >>>> code is expecting it to still contain its previous >>>> value. >>>> >>>> Peter N >>>> >>>> >>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>> I've attached the module->dump() that our code is >>>>> producing. Unfortunately this is the smallest test >>>>> case I have available. >>>>> >>>>> This is before any optimization passes are >>>>> applied. There are two separate modules in >>>>> existence at the time, and there are no guarantees >>>>> about the order the surrounding code calls those >>>>> functions, so there may be some interaction >>>>> between them? There shouldn't be, they don't refer >>>>> to any common memory etc. There is no >>>>> multi-threading occurring. >>>>> >>>>> The function in module-dump.ll (called crashfunc >>>>> in this file) is called with >>>>> - func_params 0x0018f3b0 double [3] >>>>> [0x0] -11.339976634695301 double >>>>> [0x1] -9.7504239056205506 double >>>>> [0x2] -5.2900856817382804 double >>>>> at the time of the exception. >>>>> >>>>> This is compiled on a "i686-pc-win32" triple. All >>>>> of the non-intrinsic functions referred to in >>>>> these modules are the standard equivalents from >>>>> the MSVC library (e.g. @asin is the standard C lib >>>>> double asin( double ) ). >>>>> >>>>> Hopefully this is reproducible for you. >>>>> >>>>> -- >>>>> PeterN >>>>> >>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>> Are you able to send any IR for others to >>>>>> reproduce this issue? >>>>>> >>>>>> >>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>>>>> > wrote: >>>>>> >>>>>> Unfortunately, this doesn't appear to be the >>>>>> bug I'm hitting. I applied the fix to my >>>>>> source and it didn't make a difference. >>>>>> >>>>>> Also further testing found me getting the >>>>>> same behavior with other SIMD instructions. >>>>>> The common factor is in each case, ECX is set >>>>>> to 0x7fffffff, and it's an operation using >>>>>> xmm ptr ecx+offset . >>>>>> >>>>>> Additionally, turning the optimization level >>>>>> passed to createJIT down appears to avoid it, >>>>>> so I'm now leaning towards a bug in one of >>>>>> the optimization passes. >>>>>> >>>>>> I'm going to dig through the passes >>>>>> controlled by that parameter and see if I can >>>>>> narrow down which optimization is causing it. >>>>>> >>>>>> Peter N >>>>>> >>>>>> >>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>> >>>>>> As someone off list just told me, perhaps >>>>>> my new bug is the same issue: >>>>>> >>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>> >>>>>> Do you happen to be using FastISel? >>>>>> >>>>>> Solomon >>>>>> >>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>> >>>>> > wrote: >>>>>> >>>>>> Hello all, >>>>>> >>>>>> I'm currently in the process of >>>>>> debugging a crash occurring in our >>>>>> program. In LLVM 3.2 and 3.3 it >>>>>> appears that JIT generated code is >>>>>> attempting to perform access >>>>>> unaligned memory with a SSE2 >>>>>> instruction. However this only >>>>>> happens under certain conditions that >>>>>> seem (but may not be) related to the >>>>>> stacks state on calling the function. >>>>>> >>>>>> Our program acts as a front-end, >>>>>> using the LLVM C++ API to generate a >>>>>> JIT generated function. This function >>>>>> is primarily mathematical, so we use >>>>>> the Vector types to take advantage of >>>>>> SIMD instructions (as well as a few >>>>>> SSE2 intrinsics). >>>>>> >>>>>> This worked in LLVM 2.8 but started >>>>>> failing in 3.2 and has continued to >>>>>> fail in 3.3. It fails with no >>>>>> optimizations applied to the LLVM >>>>>> Function/Module. It crashes with what >>>>>> is reported as a memory access error >>>>>> (accessing 0xffffffff), however it's >>>>>> suggested that this is how the SSE >>>>>> fault raising mechanism appears. >>>>>> >>>>>> The generated instruction varies, but >>>>>> it seems to often be similar to (I >>>>>> don't have it in front of me, sorry): >>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>> Where the xmm register changes, and >>>>>> the second parameter is a memory access. >>>>>> ECX is always set to 0x7ffffff - >>>>>> however I don't know if this is part >>>>>> of the SSE error reporting process or >>>>>> is part of the situation causing the >>>>>> error. >>>>>> >>>>>> I haven't worked out exactly what >>>>>> code path etc is causing this crash. >>>>>> I'm hoping that someone can tell me >>>>>> if there were any changed >>>>>> requirements for working with SIMD in >>>>>> LLVM 3.2 (or earlier, we haven't >>>>>> tried 3.0 or 3.1). I currently >>>>>> suspect the use of GlobalVariable (we >>>>>> first discovered the crash when using >>>>>> a feature that uses them), however I >>>>>> have attempted using setAlignment on >>>>>> the GlobalVariables without any change. >>>>>> >>>>>> -- >>>>>> Peter N >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu >>>>>> >>>>>> http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu >>>>>> >>>>>> http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 002E00D0 push ebp 002E00D1 mov ebp,esp 002E00D3 push ebx 002E00D4 push edi 002E00D5 push esi 002E00D6 and esp,0FFFFFFF0h 002E00DC sub esp,110h 002E00E2 mov eax,dword ptr [ebp+8] 002E00E5 movddup xmm0,mmword ptr [eax+10h] 002E00EA movapd xmmword ptr [esp+80h],xmm0 002E00F3 movddup xmm0,mmword ptr [eax+8] 002E00F8 movapd xmmword ptr [esp+70h],xmm0 002E00FE movddup xmm0,mmword ptr [eax] 002E0102 movapd xmmword ptr [esp+60h],xmm0 002E0108 xorpd xmm0,xmm0 002E010C movapd xmmword ptr [esp+0C0h],xmm0 002E0115 xorpd xmm1,xmm1 002E0119 xorpd xmm7,xmm7 002E011D movapd xmmword ptr [esp+0A0h],xmm1 002E0126 movapd xmmword ptr [esp+0B0h],xmm7 002E012F movapd xmm3,xmm1 002E0133 movlpd qword ptr [esp+0F0h],xmm3 002E013C movhpd qword ptr [esp+0E0h],xmm3 002E0145 movlpd qword ptr [esp+100h],xmm7 002E014E pshufd xmm0,xmm7,44h 002E0153 movdqa xmm5,xmm0 002E0157 xorpd xmm4,xmm4 002E015B mulpd xmm5,xmm4 002E015F pshufd xmm2,xmm3,44h 002E0164 movdqa xmm1,xmm2 002E0168 mulpd xmm1,xmm4 002E016C xorpd xmm7,xmm7 002E0170 movapd xmm4,xmmword ptr [esp+70h] 002E0176 subpd xmm4,xmm1 002E017A pshufd xmm3,xmm3,0EEh 002E017F subpd xmm4,xmm3 002E0183 subpd xmm4,xmm5 002E0187 fld qword ptr [esp+0F0h] 002E018E call 76719BA1 CALL 002E0193 imul ebx,eax,0Ch 002E0196 lea esi,[ebx+3] 002E0199 shl esi,4 002E019C movapd xmm6,xmmword ptr [esi+2C0030h] 002E01A4 mulpd xmm6,xmm4 002E01A8 mulpd xmm3,xmm7 002E01AC movapd xmm7,xmmword ptr [esp+60h] 002E01B2 subpd xmm7,xmm2 002E01B6 subpd xmm7,xmm3 002E01BA subpd xmm7,xmm5 002E01BE movapd xmm2,xmmword ptr [esi+2C0020h] 002E01C6 mulpd xmm2,xmm7 002E01CA addpd xmm2,xmm6 002E01CE movapd xmm5,xmmword ptr [esp+80h] 002E01D7 subpd xmm5,xmm1 002E01DB subpd xmm5,xmm3 002E01DF mulpd xmm0,xmmword ptr ds:[2E0010h] 002E01E7 subpd xmm5,xmm0 002E01EB movapd xmm6,xmmword ptr [esi+2C0040h] 002E01F3 mulpd xmm6,xmm5 002E01F7 addpd xmm6,xmm2 002E01FB addpd xmm6,xmmword ptr [esi+2C0050h] 002E0203 fld qword ptr [esp+0E0h] 002E020A call 76719BA1 CALL 002E020F imul edi,eax,0Ch 002E0212 lea ecx,[edi+3] First time ECX is touched 002E0215 shl ecx,4 002E0218 movapd xmm0,xmmword ptr [ecx+2C0030h] *** 002E0220 mulpd xmm0,xmm6 002E0224 mov eax,ebx 002E0226 shl eax,4 002E0229 movapd xmm1,xmmword ptr [eax+2C0010h] 002E0231 mulpd xmm1,xmm7 002E0235 or ebx,1 002E0238 shl ebx,4 002E023B movapd xmm2,xmmword ptr [ebx+2C0010h] 002E0243 mulpd xmm2,xmm4 002E0247 addpd xmm2,xmm1 002E024B movapd xmm3,xmmword ptr [ebx+2C0020h] 002E0253 mulpd xmm3,xmm5 002E0257 addpd xmm3,xmm2 002E025B addpd xmm3,xmmword ptr [esi+2C0010h] 002E0263 movapd xmm1,xmmword ptr [ecx+2C0020h] *** 002E026B mulpd xmm1,xmm3 002E026F addpd xmm1,xmm0 002E0273 mulpd xmm4,xmmword ptr [esi+2C0070h] 002E027B mulpd xmm7,xmmword ptr [esi+2C0060h] 002E0283 addpd xmm7,xmm4 002E0287 mulpd xmm5,xmmword ptr [esi+2C0080h] 002E028F addpd xmm5,xmm7 002E0293 addpd xmm5,xmmword ptr [esi+2C0090h] 002E029B movapd xmm7,xmmword ptr [ecx+2C0040h] *** 002E02A3 mulpd xmm7,xmm5 002E02A7 addpd xmm7,xmm1 002E02AB addpd xmm7,xmmword ptr [ecx+2C0050h] *** 002E02B3 fld qword ptr [esp+100h] 002E02BA call 76719BA1 CALL 002E02BF imul edx,eax,0Ch 002E02C2 lea eax,[edx+3] 002E02C5 shl eax,4 002E02C8 movapd xmm1,xmmword ptr [eax+2C0130h] 002E02D0 mulpd xmm1,xmm7 002E02D4 lea esi,[edi+1] 002E02D7 shl esi,4 002E02DA movapd xmm0,xmmword ptr [esi+2C0010h] 002E02E2 mulpd xmm0,xmm6 002E02E6 shl edi,4 002E02E9 movapd xmm2,xmmword ptr [edi+2C0010h] 002E02F1 mulpd xmm2,xmm3 002E02F5 addpd xmm2,xmm0 002E02F9 movapd xmm0,xmmword ptr [esi+2C0020h] 002E0301 mulpd xmm0,xmm5 002E0305 addpd xmm0,xmm2 002E0309 addpd xmm0,xmmword ptr [ecx+2C0010h] *** 002E0311 movapd xmm2,xmmword ptr [eax+2C0120h] *** 002E0319 mulpd xmm2,xmm0 002E031D addpd xmm2,xmm1 002E0321 mulpd xmm6,xmmword ptr [ecx+2C0070h] *** 002E0329 mulpd xmm3,xmmword ptr [ecx+2C0060h] *** 002E0331 addpd xmm3,xmm6 002E0335 mulpd xmm5,xmmword ptr [ecx+2C0080h] *** 002E033D addpd xmm5,xmm3 002E0341 addpd xmm5,xmmword ptr [ecx+2C0090h] *** 002E0349 movapd xmm6,xmmword ptr [eax+2C0140h] *** 002E0351 mulpd xmm6,xmm5 002E0355 addpd xmm6,xmm2 002E0359 addpd xmm6,xmmword ptr [eax+2C0150h] *** 002E0361 movapd xmm2,xmm6 002E0365 mulpd xmm2,xmmword ptr ds:[2E00C0h] 002E036D lea ecx,[edx+1] ECX set 002E0370 shl ecx,4 002E0373 movapd xmm1,xmmword ptr [ecx+2C00D0h] 002E037B mulpd xmm1,xmm7 002E037F shl edx,4 002E0382 movapd xmm3,xmmword ptr [edx+2C00D0h] 002E038A mulpd xmm3,xmm0 002E038E addpd xmm3,xmm1 002E0392 movapd xmm4,xmmword ptr [ecx+2C00E0h] 002E039A mulpd xmm4,xmm5 002E039E addpd xmm4,xmm3 002E03A2 addpd xmm4,xmmword ptr [eax+2C00D0h] 002E03AA movapd xmm3,xmm4 002E03AE addpd xmm3,xmmword ptr ds:[2E0020h] 002E03B6 movapd xmm1,xmm3 002E03BA subpd xmm1,xmm2 002E03BE movapd xmmword ptr [esp+90h],xmm1 002E03C7 movapd xmm2,xmmword ptr ds:[2E0030h] 002E03CF mulpd xmm1,xmm2 002E03D3 mulpd xmm7,xmmword ptr [eax+2C00F0h] 002E03DB mulpd xmm0,xmmword ptr [eax+2C00E0h] 002E03E3 addpd xmm0,xmm7 002E03E7 mulpd xmm5,xmmword ptr [eax+2C0100h] 002E03EF addpd xmm5,xmm0 002E03F3 addpd xmm5,xmmword ptr [eax+2C0110h] 002E03FB movapd xmm0,xmm5 002E03FF addpd xmm0,xmmword ptr ds:[2E0040h] 002E0407 movapd xmm7,xmm0 002E040B movapd xmm2,xmmword ptr ds:[2E0050h] 002E0413 mulpd xmm7,xmm2 002E0417 subpd xmm7,xmm1 002E041B xorpd xmm1,xmm1 002E041F mulpd xmm3,xmm1 002E0423 addpd xmm3,xmm6 002E0427 movapd xmm2,xmm3 002E042B mulpd xmm2,xmm1 002E042F addpd xmm2,xmm7 002E0433 addpd xmm2,xmm1 002E0437 movapd xmm1,xmmword ptr ds:[2E0060h] 002E043F mulpd xmm2,xmm1 002E0443 mulpd xmm2,xmm2 002E0447 movapd xmm1,xmmword ptr [esp+90h] 002E0450 mulpd xmm1,xmmword ptr ds:[2E0050h] 002E0458 mulpd xmm0,xmmword ptr ds:[2E0030h] 002E0460 addpd xmm0,xmm1 002E0464 addpd xmm0,xmmword ptr ds:[2E00C0h] 002E046C movapd xmm1,xmmword ptr ds:[2E0060h] 002E0474 mulpd xmm0,xmm1 002E0478 mulpd xmm0,xmm0 002E047C addpd xmm0,xmm2 002E0480 mulpd xmm7,xmmword ptr ds:[2E00C0h] 002E0488 xorpd xmm2,xmm2 002E048C subpd xmm3,xmm7 002E0490 addpd xmm3,xmm2 002E0494 mulpd xmm3,xmm1 002E0498 mulpd xmm3,xmm3 002E049C addpd xmm3,xmm0 002E04A0 movapd xmm7,xmmword ptr ds:[2E0070h] 002E04A8 movapd xmm0,xmm7 002E04AC subpd xmm0,xmm3 002E04B0 movapd xmm1,xmmword ptr ds:[2E0080h] 002E04B8 mulpd xmm5,xmm1 002E04BC mulpd xmm5,xmm5 002E04C0 mulpd xmm4,xmm1 002E04C4 mulpd xmm4,xmm4 002E04C8 addpd xmm4,xmm5 002E04CC mulpd xmm6,xmm1 002E04D0 mulpd xmm6,xmm6 002E04D4 addpd xmm6,xmm4 002E04D8 movapd xmm2,xmm7 002E04DC subpd xmm2,xmm6 002E04E0 movapd xmm1,xmm2 002E04E4 addpd xmm1,xmm0 002E04E8 mulpd xmm0,xmm0 002E04EC mulpd xmm2,xmm2 002E04F0 addpd xmm2,xmm0 002E04F4 sqrtpd xmm0,xmm2 002E04F8 addpd xmm0,xmm1 002E04FC addpd xmm2,xmm7 002E0500 movsd xmm4,mmword ptr ds:[2E0090h] 002E0508 movapd xmm1,xmm4 002E050C divsd xmm1,xmm2 002E0510 unpckhpd xmm2,xmm2 002E0514 movapd xmm3,xmm4 002E0518 divsd xmm3,xmm2 002E051C unpcklpd xmm1,xmm3 002E0520 mulpd xmm1,xmmword ptr ds:[2E00A0h] 002E0528 addpd xmm1,xmm0 002E052C movapd xmm3,xmmword ptr [esp+0A0h] 002E0535 movapd xmm0,xmm3 002E0539 unpckhpd xmm0,xmm0 002E053D movapd xmm2,xmm3 002E0541 movapd xmm6,xmm3 002E0545 addsd xmm2,xmm0 002E0549 movapd xmm3,xmmword ptr [esp+0B0h] 002E0552 addsd xmm2,xmm3 002E0556 movapd xmm7,xmm3 002E055A xorpd xmm3,xmm3 002E055E ucomisd xmm2,xmm3 002E0562 setnp al 002E0565 sete cl 002E0568 test al,cl 002E056A jne 002E059A 002E0570 movapd xmm5,xmmword ptr [esp+0C0h] 002E0579 movapd xmm2,xmm5 002E057D addpd xmm2,xmm1 002E0581 mulpd xmm1,xmm1 002E0585 mulpd xmm5,xmm5 002E0589 addpd xmm5,xmm1 002E058D sqrtpd xmm5,xmm5 002E0591 addpd xmm5,xmm2 002E0595 jmp 002E059E 002E059A movapd xmm5,xmm1 002E059E movapd xmmword ptr [esp+0C0h],xmm5 002E05A7 movapd xmm2,xmm6 002E05AB addsd xmm2,xmm4 002E05AF ucomisd xmm2,xmm4 002E05B3 xorpd xmm1,xmm1 002E05B7 jae 002E05C1 002E05BD movapd xmm1,xmm2 002E05C1 jb 002E05CB 002E05C7 addsd xmm0,xmm4 002E05CB ucomisd xmm0,xmm4 002E05CF xorpd xmm2,xmm2 002E05D3 jae 002E05DD 002E05D9 movapd xmm2,xmm0 002E05DD movsd xmm6,xmm1 002E05E1 unpcklpd xmm6,xmm2 002E05E5 movapd xmm1,xmm6 002E05E9 movapd xmm0,xmm7 002E05ED jb 002E05F7 002E05F3 addsd xmm0,xmm4 002E05F7 ucomisd xmm0,mmword ptr ds:[2E00B0h] 002E05FF jae 002E0609 002E0605 movapd xmm3,xmm0 002E0609 movsd xmm7,xmm3 002E060D jb 002E011D 002E0613 movapd xmm0,xmmword ptr [esp+0C0h] 002E061C movlpd qword ptr [esp+0D0h],xmm0 002E0625 fld qword ptr [esp+0D0h] 002E062C lea esp,[ebp-0Ch] 002E062F pop esi 002E0630 pop edi 002E0631 pop ebx 002E0632 pop ebp 002E0633 ret -------------- next part -------------- 002B00B8 push ebp 002B00B9 mov ebp,esp 002B00BB and esp,0FFFFFFF0h 002B00C1 sub esp,540h 002B00C7 mov eax,dword ptr [ebp+8] 002B00CA movsd xmm0,mmword ptr [eax+10h] 002B00CF unpcklpd xmm0,xmm0 002B00D3 movsd xmm1,mmword ptr [eax] 002B00D7 movsd xmm2,mmword ptr [eax+8] 002B00DC unpcklpd xmm2,xmm2 002B00E0 unpcklpd xmm1,xmm1 002B00E4 xorps xmm3,xmm3 002B00E7 movaps xmm4,xmm3 002B00EA movaps xmm5,xmm3 002B00ED movaps xmmword ptr [esp+4F0h],xmm2 002B00F5 movaps xmmword ptr [esp+4E0h],xmm0 002B00FD movaps xmmword ptr [esp+4D0h],xmm1 002B0105 movaps xmmword ptr [esp+4C0h],xmm5 002B010D movaps xmmword ptr [esp+4B0h],xmm3 002B0115 movaps xmmword ptr [esp+4A0h],xmm4 002B011D movaps xmm0,xmmword ptr [esp+4C0h] 002B0125 movaps xmm1,xmmword ptr [esp+4B0h] 002B012D movaps xmm2,xmmword ptr [esp+4A0h] 002B0135 movaps xmm3,xmm1 002B0138 movaps xmm4,xmm1 002B013B shufpd xmm4,xmm4,0 002B0140 movaps xmm5,xmmword ptr [esp+4D0h] 002B0148 subpd xmm5,xmm4 002B014C xorps xmm6,xmm6 002B014F mulpd xmm4,xmm6 002B0153 xorps xmm7,xmm7 002B0156 movaps xmmword ptr [esp+490h],xmm0 002B015E movaps xmm0,xmmword ptr [esp+4F0h] 002B0166 subpd xmm0,xmm4 002B016A movaps xmmword ptr [esp+480h],xmm0 002B0172 movaps xmm0,xmmword ptr [esp+4E0h] 002B017A subpd xmm0,xmm4 002B017E movaps xmm4,xmm1 002B0181 shufpd xmm4,xmm4,3 002B0186 movaps xmmword ptr [esp+470h],xmm0 002B018E movaps xmm0,xmm4 002B0191 mulpd xmm0,xmm6 002B0195 subpd xmm5,xmm0 002B0199 movaps xmmword ptr [esp+460h],xmm0 002B01A1 movaps xmm0,xmmword ptr [esp+480h] 002B01A9 subpd xmm0,xmm4 002B01AD movaps xmm4,xmmword ptr [esp+470h] 002B01B5 movaps xmmword ptr [esp+450h],xmm0 002B01BD movaps xmm0,xmmword ptr [esp+460h] 002B01C5 subpd xmm4,xmm0 002B01C9 movaps xmm0,xmm2 002B01CC movsd mmword ptr [esp+448h],xmm0 002B01D5 movaps xmm0,xmm2 002B01D8 shufpd xmm0,xmm0,0 002B01DD movaps xmmword ptr [esp+430h],xmm0 002B01E5 mulpd xmm0,xmm6 002B01E9 subpd xmm5,xmm0 002B01ED movaps xmmword ptr [esp+420h],xmm0 002B01F5 movaps xmm0,xmmword ptr [esp+450h] 002B01FD movaps xmmword ptr [esp+410h],xmm1 002B0205 movaps xmm1,xmmword ptr [esp+420h] 002B020D subpd xmm0,xmm1 002B0211 movapd xmm1,xmmword ptr ds:[2B0010h] 002B0219 movaps xmmword ptr [esp+400h],xmm0 002B0221 movaps xmm0,xmmword ptr [esp+430h] 002B0229 mulpd xmm0,xmm1 002B022D subpd xmm4,xmm0 002B0231 movaps xmm0,xmmword ptr [esp+410h] 002B0239 movlpd qword ptr [esp+520h],xmm0 002B0242 fld qword ptr [esp+520h] 002B0249 call 76719BA1 002B024E imul eax,eax,0Ch 002B0251 mov edx,eax 002B0253 shl edx,4 002B0256 movapd xmm1,xmmword ptr [edx+330010h] 002B025E mov edx,eax 002B0260 or edx,1 002B0263 shl edx,4 002B0266 movapd xmm0,xmmword ptr [edx+330010h] 002B026E movaps xmmword ptr [esp+3F0h],xmm0 002B0276 movapd xmm0,xmmword ptr [edx+330020h] 002B027E or eax,3 002B0281 shl eax,4 002B0284 movaps xmmword ptr [esp+3E0h],xmm0 002B028C movapd xmm0,xmmword ptr [eax+330010h] 002B0294 movaps xmmword ptr [esp+3D0h],xmm0 002B029C movapd xmm0,xmmword ptr [eax+330020h] 002B02A4 movaps xmmword ptr [esp+3C0h],xmm0 002B02AC movapd xmm0,xmmword ptr [eax+330030h] 002B02B4 movaps xmmword ptr [esp+3B0h],xmm0 002B02BC movapd xmm0,xmmword ptr [eax+330040h] 002B02C4 movaps xmmword ptr [esp+3A0h],xmm0 002B02CC movapd xmm0,xmmword ptr [eax+330050h] 002B02D4 movaps xmmword ptr [esp+390h],xmm0 002B02DC movapd xmm0,xmmword ptr [eax+330060h] 002B02E4 movaps xmmword ptr [esp+380h],xmm0 002B02EC movapd xmm0,xmmword ptr [eax+330070h] 002B02F4 movaps xmmword ptr [esp+370h],xmm0 002B02FC movapd xmm0,xmmword ptr [eax+330080h] 002B0304 movaps xmmword ptr [esp+360h],xmm0 002B030C movapd xmm0,xmmword ptr [eax+330090h] 002B0314 movaps xmmword ptr [esp+350h],xmm0 002B031C movaps xmm0,xmmword ptr [esp+3E0h] 002B0324 mulpd xmm0,xmm4 002B0328 movaps xmmword ptr [esp+340h],xmm0 002B0330 movaps xmm0,xmmword ptr [esp+3F0h] 002B0338 movaps xmmword ptr [esp+330h],xmm1 002B0340 movaps xmm1,xmmword ptr [esp+400h] 002B0348 mulpd xmm0,xmm1 002B034C movaps xmm1,xmmword ptr [esp+330h] 002B0354 mulpd xmm1,xmm5 002B0358 addpd xmm1,xmm0 002B035C movaps xmm0,xmmword ptr [esp+340h] 002B0364 addpd xmm0,xmm1 002B0368 movaps xmm1,xmmword ptr [esp+3D0h] 002B0370 addpd xmm1,xmm0 002B0374 movaps xmm0,xmm4 002B0377 movaps xmmword ptr [esp+320h],xmm1 002B037F movaps xmm1,xmmword ptr [esp+3A0h] 002B0387 mulpd xmm0,xmm1 002B038B movaps xmm1,xmmword ptr [esp+3B0h] 002B0393 movaps xmmword ptr [esp+310h],xmm0 002B039B movaps xmm0,xmmword ptr [esp+400h] 002B03A3 mulpd xmm1,xmm0 002B03A7 movaps xmm0,xmmword ptr [esp+3C0h] 002B03AF mulpd xmm0,xmm5 002B03B3 addpd xmm0,xmm1 002B03B7 movaps xmm1,xmmword ptr [esp+310h] 002B03BF addpd xmm1,xmm0 002B03C3 movaps xmm0,xmmword ptr [esp+390h] 002B03CB addpd xmm0,xmm1 002B03CF movaps xmm1,xmmword ptr [esp+360h] 002B03D7 mulpd xmm4,xmm1 002B03DB movaps xmm1,xmmword ptr [esp+400h] 002B03E3 movaps xmmword ptr [esp+300h],xmm0 002B03EB movaps xmm0,xmmword ptr [esp+370h] 002B03F3 mulpd xmm1,xmm0 002B03F7 movaps xmm0,xmmword ptr [esp+380h] 002B03FF mulpd xmm5,xmm0 002B0403 addpd xmm5,xmm1 002B0407 addpd xmm5,xmm4 002B040B movaps xmm0,xmmword ptr [esp+350h] 002B0413 addpd xmm0,xmm5 002B0417 movaps xmm1,xmmword ptr [esp+410h] 002B041F movhpd qword ptr [esp+510h],xmm1 002B0428 fld qword ptr [esp+510h] 002B042F call 76719BA1 002B0434 imul eax,eax,0Ch 002B0437 mov edx,eax 002B0439 shl edx,4 002B043C movapd xmm4,xmmword ptr [edx+330010h] 002B0444 mov edx,eax 002B0446 or edx,1 002B0449 shl edx,4 002B044C movapd xmm5,xmmword ptr [edx+330010h] 002B0454 movapd xmm1,xmmword ptr [edx+330020h] 002B045C or eax,3 002B045F shl eax,4 002B0462 movaps xmmword ptr [esp+2F0h],xmm0 002B046A movapd xmm0,xmmword ptr [eax+330010h] 002B0472 movaps xmmword ptr [esp+2E0h],xmm0 002B047A movapd xmm0,xmmword ptr [eax+330020h] 002B0482 movaps xmmword ptr [esp+2D0h],xmm0 002B048A movapd xmm0,xmmword ptr [eax+330030h] 002B0492 movaps xmmword ptr [esp+2C0h],xmm0 002B049A movapd xmm0,xmmword ptr [eax+330040h] 002B04A2 movaps xmmword ptr [esp+2B0h],xmm0 002B04AA movapd xmm0,xmmword ptr [eax+330050h] 002B04B2 movaps xmmword ptr [esp+2A0h],xmm0 002B04BA movapd xmm0,xmmword ptr [eax+330060h] 002B04C2 movaps xmmword ptr [esp+290h],xmm0 002B04CA movapd xmm0,xmmword ptr [eax+330070h] 002B04D2 movaps xmmword ptr [esp+280h],xmm0 002B04DA movapd xmm0,xmmword ptr [eax+330080h] 002B04E2 movaps xmmword ptr [esp+270h],xmm0 002B04EA movapd xmm0,xmmword ptr [eax+330090h] 002B04F2 movaps xmmword ptr [esp+260h],xmm0 002B04FA movaps xmm0,xmmword ptr [esp+2F0h] 002B0502 mulpd xmm0,xmm1 002B0506 movaps xmm1,xmmword ptr [esp+300h] 002B050E mulpd xmm1,xmm5 002B0512 movaps xmm5,xmmword ptr [esp+320h] 002B051A mulpd xmm5,xmm4 002B051E addpd xmm5,xmm1 002B0522 addpd xmm5,xmm0 002B0526 movaps xmm0,xmmword ptr [esp+2E0h] 002B052E addpd xmm0,xmm5 002B0532 movaps xmm1,xmmword ptr [esp+2F0h] 002B053A movaps xmm4,xmmword ptr [esp+2B0h] 002B0542 mulpd xmm1,xmm4 002B0546 movaps xmm4,xmmword ptr [esp+300h] 002B054E movaps xmm5,xmmword ptr [esp+2C0h] 002B0556 mulpd xmm4,xmm5 002B055A movaps xmm5,xmmword ptr [esp+320h] 002B0562 movaps xmmword ptr [esp+250h],xmm0 002B056A movaps xmm0,xmmword ptr [esp+2D0h] 002B0572 mulpd xmm5,xmm0 002B0576 addpd xmm5,xmm4 002B057A addpd xmm5,xmm1 002B057E movaps xmm0,xmmword ptr [esp+2A0h] 002B0586 addpd xmm0,xmm5 002B058A movaps xmm1,xmmword ptr [esp+2F0h] 002B0592 movaps xmm4,xmmword ptr [esp+270h] 002B059A mulpd xmm1,xmm4 002B059E movaps xmm4,xmmword ptr [esp+300h] 002B05A6 movaps xmm5,xmmword ptr [esp+280h] 002B05AE mulpd xmm4,xmm5 002B05B2 movaps xmm5,xmmword ptr [esp+320h] 002B05BA movaps xmmword ptr [esp+240h],xmm0 002B05C2 movaps xmm0,xmmword ptr [esp+290h] 002B05CA mulpd xmm5,xmm0 002B05CE addpd xmm5,xmm4 002B05D2 addpd xmm5,xmm1 002B05D6 movaps xmm0,xmmword ptr [esp+260h] 002B05DE addpd xmm0,xmm5 002B05E2 movlpd qword ptr [esp+530h],xmm2 002B05EB fld qword ptr [esp+530h] 002B05F2 call 76719BA1 002B05F7 imul eax,eax,0Ch 002B05FA mov edx,eax 002B05FC shl edx,4 002B05FF movapd xmm1,xmmword ptr [edx+3300D0h] 002B0607 mov edx,eax 002B0609 or edx,1 002B060C shl edx,4 002B060F movapd xmm4,xmmword ptr [edx+3300D0h] 002B0617 movapd xmm5,xmmword ptr [edx+3300E0h] 002B061F or eax,3 002B0622 shl eax,4 002B0625 movaps xmmword ptr [esp+230h],xmm0 002B062D movapd xmm0,xmmword ptr [eax+3300D0h] 002B0635 movaps xmmword ptr [esp+220h],xmm0 002B063D movapd xmm0,xmmword ptr [eax+3300E0h] 002B0645 movaps xmmword ptr [esp+210h],xmm0 002B064D movapd xmm0,xmmword ptr [eax+3300F0h] 002B0655 movaps xmmword ptr [esp+200h],xmm0 002B065D movapd xmm0,xmmword ptr [eax+330100h] 002B0665 movaps xmmword ptr [esp+1F0h],xmm0 002B066D movapd xmm0,xmmword ptr [eax+330110h] 002B0675 movaps xmmword ptr [esp+1E0h],xmm0 002B067D movapd xmm0,xmmword ptr [eax+330120h] 002B0685 movaps xmmword ptr [esp+1D0h],xmm0 002B068D movapd xmm0,xmmword ptr [eax+330130h] 002B0695 movaps xmmword ptr [esp+1C0h],xmm0 002B069D movapd xmm0,xmmword ptr [eax+330140h] 002B06A5 movaps xmmword ptr [esp+1B0h],xmm0 002B06AD movapd xmm0,xmmword ptr [eax+330150h] 002B06B5 movaps xmmword ptr [esp+1A0h],xmm0 002B06BD movaps xmm0,xmmword ptr [esp+230h] 002B06C5 mulpd xmm0,xmm5 002B06C9 movaps xmm5,xmmword ptr [esp+240h] 002B06D1 mulpd xmm5,xmm4 002B06D5 movaps xmm4,xmmword ptr [esp+250h] 002B06DD mulpd xmm4,xmm1 002B06E1 addpd xmm4,xmm5 002B06E5 addpd xmm4,xmm0 002B06E9 movaps xmm0,xmmword ptr [esp+220h] 002B06F1 addpd xmm0,xmm4 002B06F5 movaps xmm1,xmmword ptr [esp+230h] 002B06FD movaps xmm4,xmmword ptr [esp+1F0h] 002B0705 mulpd xmm1,xmm4 002B0709 movaps xmm4,xmmword ptr [esp+240h] 002B0711 movaps xmm5,xmmword ptr [esp+200h] 002B0719 mulpd xmm4,xmm5 002B071D movaps xmm5,xmmword ptr [esp+250h] 002B0725 movaps xmmword ptr [esp+190h],xmm0 002B072D movaps xmm0,xmmword ptr [esp+210h] 002B0735 mulpd xmm5,xmm0 002B0739 addpd xmm5,xmm4 002B073D addpd xmm5,xmm1 002B0741 movaps xmm0,xmmword ptr [esp+1E0h] 002B0749 addpd xmm0,xmm5 002B074D movaps xmm1,xmmword ptr [esp+230h] 002B0755 movaps xmm4,xmmword ptr [esp+1B0h] 002B075D mulpd xmm1,xmm4 002B0761 movaps xmm4,xmmword ptr [esp+240h] 002B0769 movaps xmm5,xmmword ptr [esp+1C0h] 002B0771 mulpd xmm4,xmm5 002B0775 movaps xmm5,xmmword ptr [esp+250h] 002B077D movaps xmmword ptr [esp+180h],xmm0 002B0785 movaps xmm0,xmmword ptr [esp+1D0h] 002B078D mulpd xmm5,xmm0 002B0791 addpd xmm5,xmm4 002B0795 addpd xmm5,xmm1 002B0799 movaps xmm0,xmmword ptr [esp+1A0h] 002B07A1 addpd xmm0,xmm5 002B07A5 movapd xmm1,xmmword ptr ds:[2B0020h] 002B07AD movaps xmm4,xmmword ptr [esp+190h] 002B07B5 mulpd xmm4,xmm1 002B07B9 mulpd xmm4,xmm4 002B07BD movaps xmm5,xmmword ptr [esp+180h] 002B07C5 mulpd xmm5,xmm1 002B07C9 mulpd xmm5,xmm5 002B07CD movaps xmmword ptr [esp+170h],xmm0 002B07D5 mulpd xmm0,xmm1 002B07D9 mulpd xmm0,xmm0 002B07DD addpd xmm4,xmm5 002B07E1 addpd xmm4,xmm0 002B07E5 movapd xmm0,xmmword ptr ds:[2B0030h] 002B07ED movaps xmm1,xmm0 002B07F0 subpd xmm1,xmm4 002B07F4 movapd xmm4,xmmword ptr ds:[2B0040h] 002B07FC movaps xmm5,xmmword ptr [esp+190h] 002B0804 addpd xmm5,xmm4 002B0808 movapd xmm4,xmmword ptr ds:[2B0050h] 002B0810 movaps xmmword ptr [esp+160h],xmm0 002B0818 movaps xmm0,xmmword ptr [esp+180h] 002B0820 addpd xmm0,xmm4 002B0824 movaps xmm4,xmmword ptr [esp+170h] 002B082C mulpd xmm4,xmm6 002B0830 movaps xmmword ptr [esp+150h],xmm0 002B0838 movaps xmm0,xmm5 002B083B subpd xmm0,xmm4 002B083F mulpd xmm5,xmm6 002B0843 movaps xmm4,xmmword ptr [esp+170h] 002B084B addpd xmm5,xmm4 002B084F movapd xmm4,xmmword ptr ds:[2B0060h] 002B0857 movaps xmmword ptr [esp+140h],xmm0 002B085F mulpd xmm0,xmm4 002B0863 movaps xmmword ptr [esp+130h],xmm0 002B086B movapd xmm0,xmmword ptr ds:[2B0070h] 002B0873 movaps xmmword ptr [esp+120h],xmm0 002B087B movaps xmm0,xmmword ptr [esp+150h] 002B0883 movaps xmmword ptr [esp+110h],xmm1 002B088B movaps xmm1,xmmword ptr [esp+120h] 002B0893 mulpd xmm0,xmm1 002B0897 movaps xmm1,xmmword ptr [esp+130h] 002B089F addpd xmm0,xmm1 002B08A3 movaps xmm1,xmmword ptr [esp+140h] 002B08AB movaps xmmword ptr [esp+100h],xmm0 002B08B3 movaps xmm0,xmmword ptr [esp+120h] 002B08BB mulpd xmm1,xmm0 002B08BF movaps xmm0,xmmword ptr [esp+150h] 002B08C7 mulpd xmm0,xmm4 002B08CB subpd xmm0,xmm1 002B08CF movaps xmm1,xmm5 002B08D2 mulpd xmm1,xmm6 002B08D6 addpd xmm1,xmm0 002B08DA mulpd xmm0,xmm6 002B08DE subpd xmm5,xmm0 002B08E2 movaps xmm0,xmmword ptr [esp+100h] 002B08EA addpd xmm0,xmm6 002B08EE addpd xmm1,xmm6 002B08F2 addpd xmm5,xmm6 002B08F6 movapd xmm4,xmmword ptr ds:[2B0080h] 002B08FE mulpd xmm0,xmm4 002B0902 mulpd xmm0,xmm0 002B0906 mulpd xmm1,xmm4 002B090A mulpd xmm1,xmm1 002B090E mulpd xmm5,xmm4 002B0912 mulpd xmm5,xmm5 002B0916 addpd xmm0,xmm1 002B091A addpd xmm0,xmm5 002B091E movaps xmm1,xmmword ptr [esp+160h] 002B0926 subpd xmm1,xmm0 002B092A movaps xmm0,xmmword ptr [esp+110h] 002B0932 mulpd xmm0,xmm0 002B0936 movaps xmm4,xmm1 002B0939 mulpd xmm4,xmm4 002B093D addpd xmm0,xmm4 002B0941 movaps xmm4,xmm0 002B0944 movaps xmm5,xmmword ptr [esp+160h] 002B094C addpd xmm4,xmm5 002B0950 movaps xmm6,xmm4 002B0953 movsd xmm5,mmword ptr ds:[2B0090h] 002B095B movaps xmmword ptr [esp+0F0h],xmm0 002B0963 movaps xmm0,xmm5 002B0966 divsd xmm0,xmm6 002B096A unpckhpd xmm4,xmm4 002B096E movaps xmm6,xmm5 002B0971 divsd xmm6,xmm4 002B0975 unpcklpd xmm0,xmm6 002B0979 movapd xmm4,xmmword ptr ds:[2B00A0h] 002B0981 mulpd xmm0,xmm4 002B0985 movaps xmm4,xmmword ptr [esp+110h] 002B098D addpd xmm4,xmm1 002B0991 movaps xmm1,xmmword ptr [esp+0F0h] 002B0999 sqrtpd xmm6,xmm1 002B099D addpd xmm6,xmm4 002B09A1 addpd xmm0,xmm6 002B09A5 movaps xmm4,xmmword ptr [esp+410h] 002B09AD unpckhpd xmm4,xmm4 002B09B1 movaps xmm6,xmm3 002B09B4 addsd xmm6,xmm4 002B09B8 movsd xmm1,mmword ptr [esp+448h] 002B09C1 addsd xmm1,xmm6 002B09C5 movaps xmm6,xmm0 002B09C8 mulpd xmm6,xmm6 002B09CC movaps xmmword ptr [esp+0E0h],xmm0 002B09D4 movaps xmm0,xmmword ptr [esp+490h] 002B09DC mulpd xmm0,xmm0 002B09E0 addpd xmm0,xmm6 002B09E4 movaps xmm6,xmmword ptr [esp+490h] 002B09EC movaps xmmword ptr [esp+0D0h],xmm0 002B09F4 movaps xmm0,xmmword ptr [esp+0E0h] 002B09FC addpd xmm6,xmm0 002B0A00 movaps xmm0,xmmword ptr [esp+0D0h] 002B0A08 sqrtpd xmm0,xmm0 002B0A0C addpd xmm0,xmm6 002B0A10 addsd xmm3,xmm5 002B0A14 movaps xmm6,xmm4 002B0A17 addsd xmm6,xmm5 002B0A1B movaps xmmword ptr [esp+0C0h],xmm0 002B0A23 movsd xmm0,mmword ptr [esp+448h] 002B0A2C addsd xmm0,xmm5 002B0A30 ucomisd xmm1,xmm7 002B0A34 setnp cl 002B0A37 sete ch 002B0A3A test cl,ch 002B0A3C movaps xmm1,xmmword ptr [esp+0E0h] 002B0A44 movsd mmword ptr [esp+0B8h],xmm7 002B0A4D movsd mmword ptr [esp+0B0h],xmm0 002B0A56 movaps xmmword ptr [esp+0A0h],xmm2 002B0A5E movsd mmword ptr [esp+98h],xmm4 002B0A67 movsd mmword ptr [esp+90h],xmm6 002B0A70 movsd mmword ptr [esp+88h],xmm5 002B0A79 movsd mmword ptr [esp+80h],xmm3 002B0A82 movaps xmmword ptr [esp+70h],xmm1 002B0A87 jne 002B0A9A 002B0A8D movaps xmm0,xmmword ptr [esp+0C0h] 002B0A95 movaps xmmword ptr [esp+70h],xmm0 002B0A9A movaps xmm0,xmmword ptr [esp+70h] 002B0A9F movsd xmm1,mmword ptr [esp+80h] 002B0AA8 movsd xmm2,mmword ptr [esp+88h] 002B0AB1 ucomisd xmm1,xmm2 002B0AB5 movsd xmm3,mmword ptr [esp+0B8h] 002B0ABE movaps xmmword ptr [esp+60h],xmm0 002B0AC3 movsd mmword ptr [esp+58h],xmm3 002B0AC9 jae 002B0ADE 002B0ACF movsd xmm0,mmword ptr [esp+80h] 002B0AD8 movsd mmword ptr [esp+58h],xmm0 002B0ADE movsd xmm0,mmword ptr [esp+58h] 002B0AE4 movsd xmm1,mmword ptr [esp+90h] 002B0AED movsd mmword ptr [esp+50h],xmm0 002B0AF3 movsd mmword ptr [esp+48h],xmm1 002B0AF9 jae 002B0B0E 002B0AFF movsd xmm0,mmword ptr [esp+98h] 002B0B08 movsd mmword ptr [esp+48h],xmm0 002B0B0E movsd xmm0,mmword ptr [esp+48h] 002B0B14 movsd xmm1,mmword ptr [esp+88h] 002B0B1D ucomisd xmm0,xmm1 002B0B21 movsd xmm2,mmword ptr [esp+0B8h] 002B0B2A movsd mmword ptr [esp+40h],xmm0 002B0B30 movsd mmword ptr [esp+38h],xmm2 002B0B36 jae 002B0B48 002B0B3C movsd xmm0,mmword ptr [esp+40h] 002B0B42 movsd mmword ptr [esp+38h],xmm0 002B0B48 movsd xmm0,mmword ptr [esp+38h] 002B0B4E movaps xmm1,xmmword ptr [esp+410h] 002B0B56 movsd xmm2,mmword ptr [esp+50h] 002B0B5C movsd xmm1,xmm2 002B0B60 unpcklpd xmm1,xmm0 002B0B64 movsd xmm0,mmword ptr [esp+0B0h] 002B0B6D movaps xmmword ptr [esp+20h],xmm1 002B0B72 movsd mmword ptr [esp+18h],xmm0 002B0B78 jae 002B0B8D 002B0B7E movsd xmm0,mmword ptr [esp+448h] 002B0B87 movsd mmword ptr [esp+18h],xmm0 002B0B8D movsd xmm0,mmword ptr [esp+18h] 002B0B93 movsd xmm1,mmword ptr ds:[2B00B0h] 002B0B9B ucomisd xmm0,xmm1 002B0B9F movsd xmm1,mmword ptr [esp+0B8h] 002B0BA8 movsd mmword ptr [esp+10h],xmm0 002B0BAE movsd mmword ptr [esp+8],xmm1 002B0BB4 jae 002B0BC6 002B0BBA movsd xmm0,mmword ptr [esp+10h] 002B0BC0 movsd mmword ptr [esp+8],xmm0 002B0BC6 movsd xmm0,mmword ptr [esp+8] 002B0BCC movaps xmm1,xmmword ptr [esp+0A0h] 002B0BD4 movsd xmm1,xmm0 002B0BD8 movaps xmm0,xmmword ptr [esp+60h] 002B0BDD movaps xmm2,xmmword ptr [esp+20h] 002B0BE2 movaps xmmword ptr [esp+4B0h],xmm2 002B0BEA movaps xmmword ptr [esp+4C0h],xmm0 002B0BF2 movaps xmmword ptr [esp+4A0h],xmm1 002B0BFA jb 002B011D 002B0C00 jmp 002B0C05 002B0C05 movaps xmm0,xmmword ptr [esp+60h] 002B0C0A movaps xmm1,xmmword ptr [esp+60h] 002B0C0F movlpd qword ptr [esp+500h],xmm1 002B0C18 fld qword ptr [esp+500h] 002B0C1F movsd mmword ptr [esp],xmm0 002B0C24 mov esp,ebp 002B0C26 pop ebp 002B0C27 ret From gm4cheng at gmail.com Fri Jul 19 10:27:29 2013 From: gm4cheng at gmail.com (Peng Cheng) Date: Fri, 19 Jul 2013 13:27:29 -0400 Subject: [LLVMdev] road blocks in support shared library of llvm Message-ID: I am using llvm for a jit engine in a large project, in which multiple shared library modules link against llvm. Since llvm is recommended as a static library, this leads to shared library symbol conflicts, leading to llvm assertions in some global table look up or insertion during the shared library loading time. That forced me to hide all llvm symbols during the link time. An alternative solution would be using shared library of llvm. But this package configuration is not supported for windows, and not recommended for other OSes according to llvm doc. see http://llvm.org/docs/CMake.html#frequently-used-cmake-variables I can understand that the reason of not supported for windows is the dll export/import declarations. Could you help me understand why it is not recommended for other OSes and what needs to be done to fully support shared library of llvm? One reason I am speculating is the global data in the llvm, which have been extensively used to store compiler options, machine types, optimization passes, ... If compiled as a shared library and multiple modules link against it, undesired results could happen when different modules use the global data for different purposes. Anyone could confirm that? Any other road blocks? Thanks, -Peng -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.topper at gmail.com Fri Jul 19 12:30:30 2013 From: craig.topper at gmail.com (Craig Topper) Date: Fri, 19 Jul 2013 12:30:30 -0700 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: <51E90826.3060201@uformia.com> References: <51E5F5ED.1000808@uformia.com> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> <51E90826.3060201@uformia.com> Message-ID: Here's my attempt at a fix. Adding Jakob to make sure I did this right. On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman wrote: > That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's working > for them as well (not just on my machine). > > Thank you, again. > > -- > Peter N > > > On 19/07/2013 5:45 PM, Craig Topper wrote: > > I don't think that's going to work. > > > On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman wrote: > >> Thank you, I'm trying this now. >> >> >> On 19/07/2013 5:23 PM, Craig Topper wrote: >> >> Try adding ECX to the Defs of this part of >> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a >> Windows machine to test myself. >> >> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP32:$src)]>, >> Requires<[In32BitMode]>; >> >> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >> "# win32 fptoui", >> [(X86WinFTOL RFP64:$src)]>, >> Requires<[In32BitMode]>; >> } >> >> >> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman wrote: >> >>> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >>> are the sqrt, I see the sqrtpd's in the output. Also there are three >>> fptoui's and there are 3 call instances. >>> >>> (Changing subject line again.) >>> >>> Now it looks like it's bug #13862 >>> >>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>> >>> I think those calls correspond to this >>> >>> %110 = fptoui double %109 to i32 >>> >>> The calls are followed by an imul with 12 which matches up with what >>> occurs right after the fptoui in the IR. >>> >>> >>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman wrote: >>> >>>> Yes, that is the result of module-dump.ll >>>> >>>> >>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>> >>>> Does this correspond to one of the .ll files you sent earlier? >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman wrote: >>>> >>>>> (Changing subject line as diagnosis has changed) >>>>> >>>>> I'm attaching the compiled code that I've been getting, both with >>>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>>> >>>>> I notice that X86::SQRTPD[m|r] appear in >>>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>>> optimization level turned off. >>>>> >>>>> I am trying this with the Release 3.3 code - I'll try it with trunk >>>>> and see if I get a different result there. Maybe there was a recent commit >>>>> for this. >>>>> >>>>> -- >>>>> Peter N >>>>> >>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>> >>>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: >>>>> >>>>>> Is there something specifically required to enable SSE? If it's not >>>>>> detected as available (based from the target triple?) then I don't think we >>>>>> enable it specifically. >>>>>> >>>>>> Also it seems that it should handle converting to/from the vector >>>>>> types, although I can see it getting confused about needing to do that if >>>>>> it thinks SSE isn't available at all. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>> >>>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>>> sqrt? >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: >>>>>> >>>>>>> In the disassembly, I'm seeing three cases of >>>>>>> call 76719BA1 >>>>>>> >>>>>>> I am assuming this is the sqrt function as this is the only function >>>>>>> called in the LLVM IR. >>>>>>> >>>>>>> The code at 76719BA1 is: >>>>>>> >>>>>>> 76719BA1 push ebp >>>>>>> 76719BA2 mov ebp,esp >>>>>>> 76719BA4 sub esp,20h >>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>> 76719BAA fld st(0) >>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>> 76719BC0 test eax,eax >>>>>>> 76719BC2 je 76719DCF >>>>>>> 76719BC8 fsubp st(1),st >>>>>>> 76719BCA test edx,edx >>>>>>> 76719BCC js 7671F9DB >>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>> 76719BDE sbb eax,0 >>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>> 76719BE5 sbb edx,0 >>>>>>> 76719BE8 leave >>>>>>> 76719BE9 ret >>>>>>> >>>>>>> >>>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>>> >>>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>>> circumstances. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>> >>>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >>>>>>> >>>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>>> prefixed with "llvm.x86". >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>>>>>>> >>>>>>>>> After stepping through the produced assembly, I believe I have a >>>>>>>>> culprit. >>>>>>>>> >>>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>>> previous value. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>>> >>>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>>> >>>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>>> >>>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>>> called with >>>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>> at the time of the exception. >>>>>>>>> >>>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>>> double asin( double ) ). >>>>>>>>> >>>>>>>>> Hopefully this is reproducible for you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> PeterN >>>>>>>>> >>>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>>> >>>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman wrote: >>>>>>>>> >>>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>>> >>>>>>>>>> Also further testing found me getting the same behavior with >>>>>>>>>> other SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>>> >>>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>>> optimization passes. >>>>>>>>>> >>>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>>> >>>>>>>>>>> As someone off list just told me, perhaps my new bug is the same >>>>>>>>>>> issue: >>>>>>>>>>> >>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>> >>>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>>> >>>>>>>>>>> Solomon >>>>>>>>>>> >>>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>>> >>>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>>> >>>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>>> >>>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>>> >>>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>>> memory access. >>>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this >>>>>>>>>>>> is part of the SSE error reporting process or is part of the situation >>>>>>>>>>>> causing the error. >>>>>>>>>>>> >>>>>>>>>>>> I haven't worked out exactly what code path etc is causing this >>>>>>>>>>>> crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Peter N >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ftol.patch Type: application/octet-stream Size: 1262 bytes Desc: not available URL: From pichet2000 at gmail.com Fri Jul 19 13:14:23 2013 From: pichet2000 at gmail.com (Francois Pichet) Date: Fri, 19 Jul 2013 16:14:23 -0400 Subject: [LLVMdev] Disable vectorization for unaligned data Message-ID: What is the proper solution to disable auto-vectorization for unaligned data? I have an out of tree target and I added this: bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const { if (VT.isVector()) return false; .... } After that, I could see that vectorization is still done on unaligned data except that llvm will copy the data back and forth from the source to the top of the stack and work from there. This is very costly, I rather get scalar operations. Then I tried to add: unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, unsigned AddressSpace) const { if (Src->isVectorTy() && Alignment != 16) return 10000; // <== high number to try to avoid unaligned load/store. return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); } Except that this doesn't work because Alignment will always be 4 even for data like: int data[16][16] __attribute__ ((aligned (16))), Because individual element are still 4-byte aligned. I am not sure what is the right way to do it? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Fri Jul 19 13:32:50 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Fri, 19 Jul 2013 13:32:50 -0700 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: Message-ID: On Fri, Jul 19, 2013 at 1:14 PM, Francois Pichet wrote: > > What is the proper solution to disable auto-vectorization for unaligned > data? Why are you trying to do this? If auto-vectorization is making a given loop slower on your target, that means the cost metrics are off, and we should fix them. If code size is an issue, you should tell the optimizer that you want to optimize for size. -Eli > I have an out of tree target and I added this: > > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) > const { > if (VT.isVector()) > return false; > .... > } > > After that, I could see that vectorization is still done on unaligned data > except that llvm will copy the data back and forth from the source to the > top of the stack and work from there. This is very costly, I rather get > scalar operations. > > Then I tried to add: > unsigned getMemoryOpCost(unsigned Opcode, Type *Src, > unsigned Alignment, > unsigned AddressSpace) const { > if (Src->isVectorTy() && Alignment != 16) > return 10000; // <== high number to try to avoid unaligned load/store. > return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, > AddressSpace); > } > > Except that this doesn't work because Alignment will always be 4 even for > data like: > int data[16][16] __attribute__ ((aligned (16))), > > Because individual element are still 4-byte aligned. > > I am not sure what is the right way to do it? > Thanks. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From pichet2000 at gmail.com Fri Jul 19 13:39:42 2013 From: pichet2000 at gmail.com (Francois Pichet) Date: Fri, 19 Jul 2013 16:39:42 -0400 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: Message-ID: Because unaligned load/store are illegal on my target. But ExpandUnalignedStore expand to too many load/store. It seem that ExpandUnalignedStore is called after the vectorization cost analysis is done and not taken into account. On Fri, Jul 19, 2013 at 4:32 PM, Eli Friedman wrote: > On Fri, Jul 19, 2013 at 1:14 PM, Francois Pichet > wrote: > > > > What is the proper solution to disable auto-vectorization for unaligned > > data? > > Why are you trying to do this? If auto-vectorization is making a > given loop slower on your target, that means the cost metrics are off, > and we should fix them. If code size is an issue, you should tell the > optimizer that you want to optimize for size. > > -Eli > > > I have an out of tree target and I added this: > > > > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool > *Fast) > > const { > > if (VT.isVector()) > > return false; > > .... > > } > > > > After that, I could see that vectorization is still done on unaligned > data > > except that llvm will copy the data back and forth from the source to the > > top of the stack and work from there. This is very costly, I rather get > > scalar operations. > > > > Then I tried to add: > > unsigned getMemoryOpCost(unsigned Opcode, Type *Src, > > unsigned Alignment, > > unsigned AddressSpace) const { > > if (Src->isVectorTy() && Alignment != 16) > > return 10000; // <== high number to try to avoid unaligned > load/store. > > return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, > > AddressSpace); > > } > > > > Except that this doesn't work because Alignment will always be 4 even for > > data like: > > int data[16][16] __attribute__ ((aligned (16))), > > > > Because individual element are still 4-byte aligned. > > > > I am not sure what is the right way to do it? > > Thanks. > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stoklund at 2pi.dk Fri Jul 19 16:31:49 2013 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Fri, 19 Jul 2013 16:31:49 -0700 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <51E789ED.2080509@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> <51E90826.3060201@uformi> Message-ID: <4C02CABA-C832-4A5E-8354-7D956CABCE15@2pi.dk> On Jul 19, 2013, at 12:30 PM, Craig Topper wrote: > Here's my attempt at a fix. Adding Jakob to make sure I did this right. Patch LGTM. Thanks, /jakob From qcolombet at apple.com Fri Jul 19 17:13:49 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Fri, 19 Jul 2013 17:13:49 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Message-ID: Hi, As no new suggestions came in, I would like to share my plans to move forward. — Feedbacks are very welcome! -- Like I summed up in my previous email (see below), there are two orthogonal approaches to this problem. I have chosen the second one (extend the reporting already used for inline asm). Regarding the first approach (front-end querying the back-end to build up diagnostic), it needs further discussion if anyone wants to go in that direction. Although I think this is interesting, I leave that for someone else. ** Proposed Plan ** Extend what already exists: back-end reports to the front-end through hooks. The detail plan is the following (still hand wavy, I did not look into the actual code base, just had some high level discussions): 1. LLVM: Extend the current diagnostic hook to pass: 1.1. A kind of diagnostic (InlineAsm, StackSize, Other ). 1.2. A boolean hinting whether the diagnostic is an error or a warning. 2. Clang: Map the diagnostic to the proper (pre-defined) group based on the passed kind. ** Remarks on ** 1.1. This is supposed to change rarely, that is why a static approach should do the trick. The ‘Other’ kind represents a fall-back cases for thing we did not anticipate (do not get too hung up on the names, they have to be defined (name, which kind of back-end diagnostic we want to expose, etc.)). 1.2. Each front-end does not have clang capabilities, thus this boolean will help the "front-end" to figure out what it should do with that diagnostic (e.g., keep going or abort). Clang may ignore them. 2. Obviously, the kind of diagnostic has to be shared between Clang and LLVM, this is the awkward part, but again, this should rarely change. Thanks again for all the contributions made so far and for the future ones as well! Cheers, -Quentin On Jul 17, 2013, at 11:34 AM, Quentin Colombet wrote: > Hi, > > Thanks all for the insight comments. > > Let me sum up at a high level what proposals we actually have (sorry if I misinterpreted or missed something, do not hesitate to correct me): > > 1. Make LLVM defines some APIs that exposes some internal information so that a front-end can use them to build-up some diagnostics. > > 2. Make LLVM builds up a diagnostic and let a front-end maps this diagnostic to the right warning/error group. > > > In my opinion, both approaches are orthogonal and have different advantages based on what goal we are pursuing. > > To be more specific, with the first approach, front-end people can come up with new warnings/errors and diagnostics without having to modify the back-end. On the other hand, with the second approach, back-end people can emit new warnings/errors without having to modify the front-end (BTW, it would be nice if we come up at least in clang with a consistent way to pass down options for emitting back-end warning/error when applicable, i.e., without -mllvm but with -W). > > What people are thinking? > > Thanks again for sharing your thoughts, I appreciate. > > Cheers, > > -Quentin > > On Jul 17, 2013, at 9:38 AM, Evan Cheng wrote: > >> >> >> Sent from my iPad >> >> On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: >> >>> >>> On Jul 17, 2013, at 2:12 AM, Chandler Carruth wrote: >>> >>>> On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson wrote: >>>> >>>> On Jul 16, 2013, at 5:51 PM, Eli Friedman wrote: >>>> >>>>> On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet wrote: >>>>>> ** Advices Needed ** >>>>>> >>>>>> 1. Decide whether or not we want such capabilities (if we do not we may just >>>>>> add sporadically the support for a new warning/group of warning/error). >>>>>> 2. Come up with a plan to implement that (assuming we want it). >>>>> >>>>> The frontend should be presenting warnings, not the backend; adding a >>>>> hook which provides the appropriate information shouldn't be too hard. >>>>> Warnings coming out of the backend are very difficult to design well, >>>>> so I don't expect we will add many. Also, keep in mind that the >>>>> information coming out of the backend could be used in other ways; it >>>>> might not make sense for the backend to decide that some piece of >>>>> information should be presented as a warning. (Consider, for example, >>>>> IDE integration to provide additional information about functions and >>>>> loops on demand.) >>>> >>>> I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end. >>>> >>>> I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them. >>>> >>>> I actually think this is the wrong approach, and I don't think it's quite what Eli or I am suggestion (of course, Eli may want to clarify, I'm only really clarifying what *I'm* suggesting. >>>> >>>> I think all of the warnings should be in the frontend, using the standard and existing machinery for generating, controlling, and displaying a warning. We already know how to do that well. The difference is that these warnings will need to query the LLVM layer for detailed information through some defined API, and base the warning on this information. This accomplishes two things: >>>> >>>> 1) It ensures the warning machinery is simple, predictable, and integrates cleanly with everything else in Clang. It does so in the best way by simply being the existing machinery. >>>> >>>> 2) It forces us to design reasonable APIs in LLVM to expose to a FE for this information. A consequence of this will be to sort out the layering issues, etc. Another consequence will be a strong chance of finding general purpose APIs in LLVM that can serve many purposes, not just a warning. Consider JITs and other systems that might benefit from having good APIs for querying the size and makeup (at a high level) of a generated function. >>>> >>>> A nice side-effect is that it simplifies the complexity involved for simple warnings -- now it merely is the complexity of exposing the commensurately simple API in LLVM. If instead we go the route of threading a FE interface for *reporting* warnings into LLVM, we have to thread an interface with sufficient power to express many different concepts. >>> >>> I don't understand what you are proposing. >>> >>> First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn't go far enough to make it work properly and it wasn't clear at the time whether we really needed it. >>> >>> Now let me try to understand what you're suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don't see how else you would do it, since clang's interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn't those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang's diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I'm misinterpreting your proposal. >> >> We can't assume clang is the frontend or design a system that only works with clang. There are many systems that use llvm which are not even c compilers. >> >> Evan >> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Fri Jul 19 17:48:44 2013 From: chandlerc at google.com (Chandler Carruth) Date: Fri, 19 Jul 2013 17:48:44 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Message-ID: Sorry for the delays, just haven't been able to get back around to this. My silence isn't about not caring. On Fri, Jul 19, 2013 at 5:13 PM, Quentin Colombet wrote: > Like I summed up in my previous email (see below), there are two > orthogonal approaches to this problem. > I have chosen the second one (extend the reporting already used for inline > asm). > > Regarding the first approach (front-end querying the back-end to build up > diagnostic), it needs further discussion if anyone wants to go in that > direction. Although I think this is interesting, I leave that for someone > else. > I really don't like the second approach. Fundamentally, it increasingly couples LLVM to Clang's warning infrastructure and I think that's a bad thing. I think we are better served by designing good interfaces for the frontend and the backend to communicate in order for different frontends to provide different types and kinds of information to users based on their needs. Bob described one potential API for this, but I think there are much better ones. I don't think we have to be constrained by the idea of using a pass to compute and communicate this information -- there are other approaches. I haven't thought about them because I haven't designed such an interface. If I had, I would just contribute it. I do think that this is the correct design direction though, as I don't think that there is one unified way to represent diagnostics in the backend, or if there is I don't tihnk we have any idea what it looks like. Inline assembly is more of a special case than anything else. I don't really see why the right path forward is to design a generic framework around what was originally built for its specific purpose. If anything, I think the design you are pursuing is strictly more complex and invasive, and without any tangible benefits as a consequence. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Fri Jul 19 18:36:48 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Fri, 19 Jul 2013 18:36:48 -0700 Subject: [LLVMdev] Request to review patch for bug #14792 In-Reply-To: References: Message-ID: On Thu, Jul 18, 2013 at 10:31 PM, Yun-Wei Lee wrote: > Thank for your suggestion. > > I have tried to use GetAlignedArgumentStackSize. > It is so strange that this function may output uncorrectly. > For example, the StackSize will be 40 when handling the third function call > in main. > If I modified the this function, it will fail one test. > I wonder this may be another bug in llvm...... > And that's why I didn't use GetAlignedArgumentStackSize previously. > > Thank you! I'm sorry, I'm not that familiar with the code. A brief look tells me that it's likely none of the patches are quite right; http://llvm.org/bugs/attachment.cgi?id=10897 looks closer, but still not quite right. -Eli > > On Thu, Jul 18, 2013 at 7:56 PM, Eli Friedman > wrote: >> >> On Thu, Jul 18, 2013 at 8:36 AM, Yun-Wei Lee wrote: >> > http://llvm.org/bugs/show_bug.cgi?id=14792 >> > >> > Problem: >> > In the i386 ABI Page 3-10, it said that the stack is aligned. However, >> > the >> > two example code show that does not handle the alignment correctly when >> > using variadic function. For example, if the size of the first argument >> > is >> > 17, the overflow_arg_area in va_list will be set to "address of first >> > argument + 16" instead of "address of first argument + 24" after calling >> > va_start. >> > In addition, #6636 showed the same problem because in AMD64, arguments >> > is >> > passed by register at first, then pass by memory when run out of >> > register >> > (AMD64 ABI 3.5.7 rule 10). >> > >> > Why this problem happened? >> > When calling va_start to set va_list, overflow_arg_area is not set >> > correctly. To set the overflow_arg_area correctly, we need to get the >> > FrameIndex correctly. Now, here comes the problem, llvm doesn't handle >> > it >> > correctly. It accounts for StackSize to compute the FrameIndex, and if >> > the >> > StackSize is not aligned, it will compute the wrong FrameIndex. As a >> > result >> > overflow_arg_area will not be set correctly. >> > >> > My Solution: >> > 1. Record the Align if it is located in Memory. >> > 2. If it is variadic function and needs to set FrameIndex, adjust the >> > stacksize. >> >> Please read http://llvm.org/docs/DeveloperPolicy.html . In >> particular, patches should be sent to llvm-commits, and patches should >> generally include a regression test. >> >> In terms of the code, you might want to consider using >> llvm::RoundUpToAlignment. >> >> -Eli > > From qcolombet at apple.com Fri Jul 19 18:46:39 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Fri, 19 Jul 2013 18:46:39 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Message-ID: Hi Chandler, > Le 19 juil. 2013 à 17:48, Chandler Carruth a écrit : > > Sorry for the delays, just haven't been able to get back around to this. My silence isn't about not caring. > >> On Fri, Jul 19, 2013 at 5:13 PM, Quentin Colombet wrote: >> Like I summed up in my previous email (see below), there are two orthogonal approaches to this problem. >> I have chosen the second one (extend the reporting already used for inline asm). >> >> Regarding the first approach (front-end querying the back-end to build up diagnostic), it needs further discussion if anyone wants to go in that direction. Although I think this is interesting, I leave that for someone else. > > I really don't like the second approach. > > Fundamentally, it increasingly couples LLVM to Clang's warning infrastructure and I think that's a bad thing. I think we are better served by designing good interfaces for the frontend and the backend to communicate in order for different frontends to provide different types and kinds of information to users based on their needs. > > Bob described one potential API for this, but I think there are much better ones. I don't think we have to be constrained by the idea of using a pass to compute and communicate this information -- there are other approaches. I haven't thought about them because I haven't designed such an interface. If I had, I would just contribute it. I do think that this is the correct design direction though, as I don't think that there is one unified way to represent diagnostics in the backend, or if there is I don't tihnk we have any idea what it looks like. > > Inline assembly is more of a special case than anything else. I don't really see why the right path forward is to design a generic framework around what was originally built for its specific purpose. > > If anything, I think the design you are pursuing is strictly more complex and invasive, and without any tangible benefits as a consequence. Fair enough. Let us go back into the discussion then :). The general question now would be what should we expose and how could we make it easily available to a front-end. Also one thing that is missing, what do we do if there is not a front end? In other words how are we supposed to abort if something bad happen when nobody is querying for diagnostics? Q. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli.friedman at gmail.com Fri Jul 19 19:03:03 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Fri, 19 Jul 2013 19:03:03 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Message-ID: On Fri, Jul 19, 2013 at 6:46 PM, Quentin Colombet wrote: > Hi Chandler, > > Le 19 juil. 2013 à 17:48, Chandler Carruth a écrit : > > Sorry for the delays, just haven't been able to get back around to this. My > silence isn't about not caring. > > On Fri, Jul 19, 2013 at 5:13 PM, Quentin Colombet > wrote: >> >> Like I summed up in my previous email (see below), there are two >> orthogonal approaches to this problem. >> I have chosen the second one (extend the reporting already used for inline >> asm). >> >> Regarding the first approach (front-end querying the back-end to build up >> diagnostic), it needs further discussion if anyone wants to go in that >> direction. Although I think this is interesting, I leave that for someone >> else. > > > I really don't like the second approach. > > Fundamentally, it increasingly couples LLVM to Clang's warning > infrastructure and I think that's a bad thing. I think we are better served > by designing good interfaces for the frontend and the backend to communicate > in order for different frontends to provide different types and kinds of > information to users based on their needs. > > Bob described one potential API for this, but I think there are much better > ones. I don't think we have to be constrained by the idea of using a pass to > compute and communicate this information -- there are other approaches. I > haven't thought about them because I haven't designed such an interface. If > I had, I would just contribute it. I do think that this is the correct > design direction though, as I don't think that there is one unified way to > represent diagnostics in the backend, or if there is I don't tihnk we have > any idea what it looks like. > > Inline assembly is more of a special case than anything else. I don't really > see why the right path forward is to design a generic framework around what > was originally built for its specific purpose. > > If anything, I think the design you are pursuing is strictly more complex > and invasive, and without any tangible benefits as a consequence. > > Fair enough. > > Let us go back into the discussion then :). > > The general question now would be what should we expose and how could we > make it easily available to a front-end. > > Also one thing that is missing, what do we do if there is not a front end? > In other words how are we supposed to abort if something bad happen when > nobody is querying for diagnostics? We can write a default diagnostic handler suitable for something like llc, where there isn't really any source code to point at. -Eli From peter at uformia.com Sat Jul 20 01:08:57 2013 From: peter at uformia.com (Peter Newman) Date: Sat, 20 Jul 2013 18:08:57 +1000 Subject: [LLVMdev] Another memory alignment issue with SSE operations Message-ID: <51EA4599.9060204@uformia.com> Unfortunately, I've ran into a second issue where addpd is being performed on memory that isn't 16 byte aligned. Again, this only happens if the createJIT OptLevel is set to Default (vs None). According to http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc8a.htm that will cause a GPF. I've attached the LLVM IR and a copy of the Disassembly this results in. The crash occurs at 00370872 At the time of the crash, ESP is set to 0018EEF8 - this results in a value is not 16 byte aligned. I notice that the offset is aligned though. The crash occurs on the first instance of addpd applied to the stack (as I understand ESP is used for). There is also raises the question of would it be worth requiring alignment of the function stack to improve performance (assuming movapd is faster then movupd). I'm not expecting LLVM to recognize this (although it would be neat) but is this something worth setting ourselves, knowing we're going to be using mostly SSE instructions? And how would we do that? -- Peter N -------------- next part -------------- A non-text attachment was scrubbed... Name: addpd-unaligned.zip Type: application/octet-stream Size: 42810 bytes Desc: not available URL: From peter at uformia.com Sat Jul 20 01:58:53 2013 From: peter at uformia.com (Peter Newman) Date: Sat, 20 Jul 2013 18:58:53 +1000 Subject: [LLVMdev] Another memory alignment issue with SSE operations In-Reply-To: <51EA4599.9060204@uformia.com> References: <51EA4599.9060204@uformia.com> Message-ID: <51EA514D.6040405@uformia.com> I discovered the alignstack(16) attribute, and am now using this in our code. This avoids this crash, and as a bonus, causes the resulting code to use movapd instead of movupd - so is potentially faster code, too. -- Peter N On 20/07/2013 6:08 PM, Peter Newman wrote: > Unfortunately, I've ran into a second issue where addpd is being > performed on memory that isn't 16 byte aligned. Again, this only > happens if the createJIT OptLevel is set to Default (vs None). > > According to > http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc8a.htm > > that will cause a GPF. > > I've attached the LLVM IR and a copy of the Disassembly this results > in. The crash occurs at 00370872 > > At the time of the crash, ESP is set to 0018EEF8 - this results in a > value is not 16 byte aligned. I notice that the offset is aligned though. > > The crash occurs on the first instance of addpd applied to the stack > (as I understand ESP is used for). > > There is also raises the question of would it be worth requiring > alignment of the function stack to improve performance (assuming > movapd is faster then movupd). I'm not expecting LLVM to recognize > this (although it would be neat) but is this something worth setting > ourselves, knowing we're going to be using mostly SSE instructions? > And how would we do that? > > -- > Peter N > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.paulsson at ericsson.com Sat Jul 20 05:07:36 2013 From: jonas.paulsson at ericsson.com (Jonas Paulsson) Date: Sat, 20 Jul 2013 12:07:36 +0000 Subject: [LLVMdev] AsmPrinter Message-ID: <2C9CFDA9EB8E5D43AAF41FA6A20623111C798922@ESESSMB201.ericsson.se> Hi, I would like to access the AsmPrinter MachineFunctionPass during compilation in order to do alternative dumping of instructions, instead of using MI->dump(), which can get a bit messy. Is there any way to access this object or the assembler strings? It seems that even these strings / methods are not available through any static methods. Does it exist during the whole compilation or is it created by PassManager in the end? /Jonas -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at uformia.com Sat Jul 20 00:44:47 2013 From: peter at uformia.com (Peter Newman) Date: Sat, 20 Jul 2013 17:44:47 +1000 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: References: <51E5F5ED.1000808@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> <51E90826.3060201@uformia.com> Message-ID: <51EA3FEF.8070708@uformia.com> I've applied this and the test cases I have here continue to work, so it looks good to me. I've ran into another (seemingly unrelated) issue which I'll describe in a separate email to the dev list. -- Peter N On 20/07/2013 5:30 AM, Craig Topper wrote: > Here's my attempt at a fix. Adding Jakob to make sure I did this right. > > > On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman > wrote: > > That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's > working for them as well (not just on my machine). > > Thank you, again. > > -- > Peter N > > > On 19/07/2013 5:45 PM, Craig Topper wrote: >> I don't think that's going to work. >> >> >> On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman > > wrote: >> >> Thank you, I'm trying this now. >> >> >> On 19/07/2013 5:23 PM, Craig Topper wrote: >>> Try adding ECX to the Defs of this part of >>> lib/Target/X86/X86InstrCompiler.td like I've done below. I >>> don't have a Windows machine to test myself. >>> >>> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >>> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP32:$src)]>, >>> Requires<[In32BitMode]>; >>> >>> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP64:$src)]>, >>> Requires<[In32BitMode]>; >>> } >>> >>> >>> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman >>> > wrote: >>> >>> Oh, excellent point, I agree. My bad. Now that I'm not >>> assuming those are the sqrt, I see the sqrtpd's in the >>> output. Also there are three fptoui's and there are 3 >>> call instances. >>> >>> (Changing subject line again.) >>> >>> Now it looks like it's bug #13862 >>> >>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>>> I think those calls correspond to this >>>> >>>> %110 = fptoui double %109 to i32 >>>> >>>> The calls are followed by an imul with 12 which matches >>>> up with what occurs right after the fptoui in the IR. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman >>>> > wrote: >>>> >>>> Yes, that is the result of module-dump.ll >>>> >>>> >>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>>> Does this correspond to one of the .ll files you >>>>> sent earlier? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman >>>>> > wrote: >>>>> >>>>> (Changing subject line as diagnosis has changed) >>>>> >>>>> I'm attaching the compiled code that I've been >>>>> getting, both with CodeGenOpt::Default and >>>>> CodeGenOpt::None . The crash isn't occurring >>>>> with CodeGenOpt::None, but that seems to be >>>>> because ECX isn't being used - it still gets >>>>> set to 0x7fffffff by one of the calls to 76719BA1 >>>>> >>>>> I notice that X86::SQRTPD[m|r] appear in >>>>> X86InstrInfo::isHighLatencyDef. I was thinking >>>>> an optimization might be removing it, but I >>>>> don't get the sqrtpd instruction even if the >>>>> createJIT optimization level turned off. >>>>> >>>>> I am trying this with the Release 3.3 code - >>>>> I'll try it with trunk and see if I get a >>>>> different result there. Maybe there was a >>>>> recent commit for this. >>>>> >>>>> -- >>>>> Peter N >>>>> >>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>>> Hmm, I'm not able to get those .ll files to >>>>>> compile if I disable SSE and I end up with >>>>>> SSE instructions(including sqrtpd) if I don't >>>>>> disable it. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter >>>>>> Newman >>>>> > wrote: >>>>>> >>>>>> Is there something specifically required >>>>>> to enable SSE? If it's not detected as >>>>>> available (based from the target triple?) >>>>>> then I don't think we enable it specifically. >>>>>> >>>>>> Also it seems that it should handle >>>>>> converting to/from the vector types, >>>>>> although I can see it getting confused >>>>>> about needing to do that if it thinks SSE >>>>>> isn't available at all. >>>>>> >>>>>> >>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>>> Hmm, maybe sse isn't being enabled so >>>>>>> its falling back to emulating sqrt? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter >>>>>>> Newman >>>>>> > wrote: >>>>>>> >>>>>>> In the disassembly, I'm seeing three >>>>>>> cases of >>>>>>> call 76719BA1 >>>>>>> >>>>>>> I am assuming this is the sqrt >>>>>>> function as this is the only >>>>>>> function called in the LLVM IR. >>>>>>> >>>>>>> The code at 76719BA1 is: >>>>>>> >>>>>>> 76719BA1 push ebp >>>>>>> 76719BA2 mov ebp,esp >>>>>>> 76719BA4 sub esp,20h >>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>> 76719BAA fld st(0) >>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>> 76719BC0 test eax,eax >>>>>>> 76719BC2 je 76719DCF >>>>>>> 76719BC8 fsubp st(1),st >>>>>>> 76719BCA test edx,edx >>>>>>> 76719BCC js 7671F9DB >>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>> 76719BDE sbb eax,0 >>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>> 76719BE5 sbb edx,0 >>>>>>> 76719BE8 leave >>>>>>> 76719BE9 ret >>>>>>> >>>>>>> >>>>>>> As you can see at 76719BD5, it >>>>>>> modifies ECX . >>>>>>> >>>>>>> I don't know that this is the sqrtpd >>>>>>> function (for example, I'm not >>>>>>> seeing any SSE instructions here?) >>>>>>> but whatever it is, it's being >>>>>>> called from the IR I attached >>>>>>> earlier, and is modifying ECX under >>>>>>> some circumstances. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:29 PM, Craig Topper >>>>>>> wrote: >>>>>>>> That should map directly to sqrtpd >>>>>>>> which can't modify ecx. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, >>>>>>>> Peter Newman >>>>>>> > wrote: >>>>>>>> >>>>>>>> Sorry, that should have been >>>>>>>> llvm.x86.sse2.sqrt.pd >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:25 PM, Craig >>>>>>>> Topper wrote: >>>>>>>>> What is >>>>>>>>> "frep.x86.sse2.sqrt.pd". I'm >>>>>>>>> only familiar with things >>>>>>>>> prefixed with "llvm.x86". >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 18, 2013 at 10:12 >>>>>>>>> PM, Peter Newman >>>>>>>>> >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> After stepping through the >>>>>>>>> produced assembly, I >>>>>>>>> believe I have a culprit. >>>>>>>>> >>>>>>>>> One of the calls to >>>>>>>>> @frep.x86.sse2.sqrt.pd is >>>>>>>>> modifying the value of ECX >>>>>>>>> - while the produced code >>>>>>>>> is expecting it to still >>>>>>>>> contain its previous value. >>>>>>>>> >>>>>>>>> Peter N >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 2:09 PM, >>>>>>>>> Peter Newman wrote: >>>>>>>>>> I've attached the >>>>>>>>>> module->dump() that our >>>>>>>>>> code is producing. >>>>>>>>>> Unfortunately this is the >>>>>>>>>> smallest test case I have >>>>>>>>>> available. >>>>>>>>>> >>>>>>>>>> This is before any >>>>>>>>>> optimization passes are >>>>>>>>>> applied. There are two >>>>>>>>>> separate modules in >>>>>>>>>> existence at the time, >>>>>>>>>> and there are no >>>>>>>>>> guarantees about the >>>>>>>>>> order the surrounding >>>>>>>>>> code calls those >>>>>>>>>> functions, so there may >>>>>>>>>> be some interaction >>>>>>>>>> between them? There >>>>>>>>>> shouldn't be, they don't >>>>>>>>>> refer to any common >>>>>>>>>> memory etc. There is no >>>>>>>>>> multi-threading occurring. >>>>>>>>>> >>>>>>>>>> The function in >>>>>>>>>> module-dump.ll (called >>>>>>>>>> crashfunc in this file) >>>>>>>>>> is called with >>>>>>>>>> - func_params 0x0018f3b0 >>>>>>>>>> double [3] >>>>>>>>>> [0x0] -11.339976634695301 >>>>>>>>>> double >>>>>>>>>> [0x1] -9.7504239056205506 >>>>>>>>>> double >>>>>>>>>> [0x2] -5.2900856817382804 >>>>>>>>>> double >>>>>>>>>> at the time of the exception. >>>>>>>>>> >>>>>>>>>> This is compiled on a >>>>>>>>>> "i686-pc-win32" triple. >>>>>>>>>> All of the non-intrinsic >>>>>>>>>> functions referred to in >>>>>>>>>> these modules are the >>>>>>>>>> standard equivalents from >>>>>>>>>> the MSVC library (e.g. >>>>>>>>>> @asin is the standard C >>>>>>>>>> lib double asin( >>>>>>>>>> double ) ). >>>>>>>>>> >>>>>>>>>> Hopefully this is >>>>>>>>>> reproducible for you. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> PeterN >>>>>>>>>> >>>>>>>>>> On 18/07/2013 4:37 PM, >>>>>>>>>> Craig Topper wrote: >>>>>>>>>>> Are you able to send any >>>>>>>>>>> IR for others to >>>>>>>>>>> reproduce this issue? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 17, 2013 at >>>>>>>>>>> 11:23 PM, Peter Newman >>>>>>>>>>> >>>>>>>>>> > >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Unfortunately, this >>>>>>>>>>> doesn't appear to be >>>>>>>>>>> the bug I'm hitting. >>>>>>>>>>> I applied the fix to >>>>>>>>>>> my source and it >>>>>>>>>>> didn't make a >>>>>>>>>>> difference. >>>>>>>>>>> >>>>>>>>>>> Also further testing >>>>>>>>>>> found me getting the >>>>>>>>>>> same behavior with >>>>>>>>>>> other SIMD >>>>>>>>>>> instructions. The >>>>>>>>>>> common factor is in >>>>>>>>>>> each case, ECX is >>>>>>>>>>> set to 0x7fffffff, >>>>>>>>>>> and it's an >>>>>>>>>>> operation using xmm >>>>>>>>>>> ptr ecx+offset . >>>>>>>>>>> >>>>>>>>>>> Additionally, >>>>>>>>>>> turning the >>>>>>>>>>> optimization level >>>>>>>>>>> passed to createJIT >>>>>>>>>>> down appears to >>>>>>>>>>> avoid it, so I'm now >>>>>>>>>>> leaning towards a >>>>>>>>>>> bug in one of the >>>>>>>>>>> optimization passes. >>>>>>>>>>> >>>>>>>>>>> I'm going to dig >>>>>>>>>>> through the passes >>>>>>>>>>> controlled by that >>>>>>>>>>> parameter and see if >>>>>>>>>>> I can narrow down >>>>>>>>>>> which optimization >>>>>>>>>>> is causing it. >>>>>>>>>>> >>>>>>>>>>> Peter N >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 17/07/2013 1:58 >>>>>>>>>>> PM, Solomon Boulos >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> As someone off >>>>>>>>>>> list just told >>>>>>>>>>> me, perhaps my >>>>>>>>>>> new bug is the >>>>>>>>>>> same issue: >>>>>>>>>>> >>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>> >>>>>>>>>>> Do you happen to >>>>>>>>>>> be using FastISel? >>>>>>>>>>> >>>>>>>>>>> Solomon >>>>>>>>>>> >>>>>>>>>>> On Jul 16, 2013, >>>>>>>>>>> at 6:39 PM, >>>>>>>>>>> Peter Newman >>>>>>>>>>> >>>>>>>>>> > >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I'm >>>>>>>>>>> currently in >>>>>>>>>>> the process >>>>>>>>>>> of debugging >>>>>>>>>>> a crash >>>>>>>>>>> occurring in >>>>>>>>>>> our program. >>>>>>>>>>> In LLVM 3.2 >>>>>>>>>>> and 3.3 it >>>>>>>>>>> appears that >>>>>>>>>>> JIT >>>>>>>>>>> generated >>>>>>>>>>> code is >>>>>>>>>>> attempting >>>>>>>>>>> to perform >>>>>>>>>>> access >>>>>>>>>>> unaligned >>>>>>>>>>> memory with >>>>>>>>>>> a SSE2 >>>>>>>>>>> instruction. >>>>>>>>>>> However this >>>>>>>>>>> only happens >>>>>>>>>>> under >>>>>>>>>>> certain >>>>>>>>>>> conditions >>>>>>>>>>> that seem >>>>>>>>>>> (but may not >>>>>>>>>>> be) related >>>>>>>>>>> to the >>>>>>>>>>> stacks state >>>>>>>>>>> on calling >>>>>>>>>>> the function. >>>>>>>>>>> >>>>>>>>>>> Our program >>>>>>>>>>> acts as a >>>>>>>>>>> front-end, >>>>>>>>>>> using the >>>>>>>>>>> LLVM C++ API >>>>>>>>>>> to generate >>>>>>>>>>> a JIT >>>>>>>>>>> generated >>>>>>>>>>> function. >>>>>>>>>>> This >>>>>>>>>>> function is >>>>>>>>>>> primarily >>>>>>>>>>> mathematical, so >>>>>>>>>>> we use the >>>>>>>>>>> Vector types >>>>>>>>>>> to take >>>>>>>>>>> advantage of >>>>>>>>>>> SIMD >>>>>>>>>>> instructions >>>>>>>>>>> (as well as >>>>>>>>>>> a few SSE2 >>>>>>>>>>> intrinsics). >>>>>>>>>>> >>>>>>>>>>> This worked >>>>>>>>>>> in LLVM 2.8 >>>>>>>>>>> but started >>>>>>>>>>> failing in >>>>>>>>>>> 3.2 and has >>>>>>>>>>> continued to >>>>>>>>>>> fail in 3.3. >>>>>>>>>>> It fails >>>>>>>>>>> with no >>>>>>>>>>> optimizations applied >>>>>>>>>>> to the LLVM >>>>>>>>>>> Function/Module. >>>>>>>>>>> It crashes >>>>>>>>>>> with what is >>>>>>>>>>> reported as >>>>>>>>>>> a memory >>>>>>>>>>> access error >>>>>>>>>>> (accessing >>>>>>>>>>> 0xffffffff), >>>>>>>>>>> however it's >>>>>>>>>>> suggested >>>>>>>>>>> that this is >>>>>>>>>>> how the SSE >>>>>>>>>>> fault >>>>>>>>>>> raising >>>>>>>>>>> mechanism >>>>>>>>>>> appears. >>>>>>>>>>> >>>>>>>>>>> The >>>>>>>>>>> generated >>>>>>>>>>> instruction >>>>>>>>>>> varies, but >>>>>>>>>>> it seems to >>>>>>>>>>> often be >>>>>>>>>>> similar to >>>>>>>>>>> (I don't >>>>>>>>>>> have it in >>>>>>>>>>> front of me, >>>>>>>>>>> sorry): >>>>>>>>>>> movapd xmm0, >>>>>>>>>>> xmm[ecx+0x???????] >>>>>>>>>>> Where the >>>>>>>>>>> xmm register >>>>>>>>>>> changes, and >>>>>>>>>>> the second >>>>>>>>>>> parameter is >>>>>>>>>>> a memory access. >>>>>>>>>>> ECX is >>>>>>>>>>> always set >>>>>>>>>>> to 0x7ffffff >>>>>>>>>>> - however I >>>>>>>>>>> don't know >>>>>>>>>>> if this is >>>>>>>>>>> part of the >>>>>>>>>>> SSE error >>>>>>>>>>> reporting >>>>>>>>>>> process or >>>>>>>>>>> is part of >>>>>>>>>>> the >>>>>>>>>>> situation >>>>>>>>>>> causing the >>>>>>>>>>> error. >>>>>>>>>>> >>>>>>>>>>> I haven't >>>>>>>>>>> worked out >>>>>>>>>>> exactly what >>>>>>>>>>> code path >>>>>>>>>>> etc is >>>>>>>>>>> causing this >>>>>>>>>>> crash. I'm >>>>>>>>>>> hoping that >>>>>>>>>>> someone can >>>>>>>>>>> tell me if >>>>>>>>>>> there were >>>>>>>>>>> any changed >>>>>>>>>>> requirements >>>>>>>>>>> for working >>>>>>>>>>> with SIMD in >>>>>>>>>>> LLVM 3.2 (or >>>>>>>>>>> earlier, we >>>>>>>>>>> haven't >>>>>>>>>>> tried 3.0 or >>>>>>>>>>> 3.1). I >>>>>>>>>>> currently >>>>>>>>>>> suspect the >>>>>>>>>>> use of >>>>>>>>>>> GlobalVariable >>>>>>>>>>> (we first >>>>>>>>>>> discovered >>>>>>>>>>> the crash >>>>>>>>>>> when using a >>>>>>>>>>> feature that >>>>>>>>>>> uses them), >>>>>>>>>>> however I >>>>>>>>>>> have >>>>>>>>>>> attempted >>>>>>>>>>> using >>>>>>>>>>> setAlignment >>>>>>>>>>> on the >>>>>>>>>>> GlobalVariables >>>>>>>>>>> without any >>>>>>>>>>> change. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Peter N >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM >>>>>>>>>>> Developers >>>>>>>>>>> mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>>> >>>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers >>>>>>>>>>> mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu >>>>>>>>>>> >>>>>>>>>>> http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>> >>>> >>>> >>>> >>>> -- >>>> ~Craig >>> >>> >>> >>> >>> -- >>> ~Craig >> >> >> >> >> -- >> ~Craig > > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschwaighofer at apple.com Sat Jul 20 09:52:13 2013 From: aschwaighofer at apple.com (Arnold Schwaighofer) Date: Sat, 20 Jul 2013 11:52:13 -0500 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: Message-ID: On Jul 19, 2013, at 3:14 PM, Francois Pichet wrote: > > What is the proper solution to disable auto-vectorization for unaligned data? > > I have an out of tree target and I added this: > > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const { > if (VT.isVector()) > return false; > .... > } > > After that, I could see that vectorization is still done on unaligned data except that llvm will copy the data back and forth from the source to the top of the stack and work from there. This is very costly, I rather get scalar operations. > > Then I tried to add: > unsigned getMemoryOpCost(unsigned Opcode, Type *Src, > unsigned Alignment, > unsigned AddressSpace) const { > if (Src->isVectorTy() && Alignment != 16) > return 10000; // <== high number to try to avoid unaligned load/store. > return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); > } > > Except that this doesn't work because Alignment will always be 4 even for data like: > int data[16][16] __attribute__ ((aligned (16))), > > Because individual element are still 4-byte aligned. We will have to hook up some logic in the loop vectorizer that computes the alignment of the vectorized version of the memory access so that we can pass it to “getMemoryOpCost". Currently, as you have observed, we will just pass the scalar loop’s memory access alignment which will be pessimistic. Instcombine will later replace the alignment to a stronger variant for vectorized code but that is obviously to late for the cost model in the vectorizer. From bob.wilson at apple.com Sat Jul 20 11:38:05 2013 From: bob.wilson at apple.com (Bob Wilson) Date: Sat, 20 Jul 2013 11:38:05 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <643A0F0E-DA20-4E47-B84F-5BE6912E5145@apple.com> <7BFDA791-4915-4785-8179-8D9D6C5367B5@apple.com> Message-ID: <2B9C96D9-8012-4304-8E5F-2C624D87AA84@apple.com> On Jul 19, 2013, at 5:48 PM, Chandler Carruth wrote: > Sorry for the delays, just haven't been able to get back around to this. My silence isn't about not caring. > > On Fri, Jul 19, 2013 at 5:13 PM, Quentin Colombet wrote: > Like I summed up in my previous email (see below), there are two orthogonal approaches to this problem. > I have chosen the second one (extend the reporting already used for inline asm). > > Regarding the first approach (front-end querying the back-end to build up diagnostic), it needs further discussion if anyone wants to go in that direction. Although I think this is interesting, I leave that for someone else. > > I really don't like the second approach. > > Fundamentally, it increasingly couples LLVM to Clang's warning infrastructure and I think that's a bad thing. I think we are better served by designing good interfaces for the frontend and the backend to communicate in order for different frontends to provide different types and kinds of information to users based on their needs. I disagree. The interface that Quentin sketched out isn’t coupling LLVM’s warnings to Clang at all. When used with Clang, it allows all of Clang’s nice diagnostic machinery to work well, but it isn’t tied to that. The proposal has a generic hook that any frontend or driver can use to be notified of warnings from LLVM. That hook would provide a simple and clean interface with just the basic information: - severity level (warning or error) - an enum to indicate the kind of diagnostic (InlineAsm, StackSize, etc.) How is that coupling us to Clang’s warning infrastructure? It’s just a hook, and it’s very simple. > > Bob described one potential API for this, but I think there are much better ones. I don't think we have to be constrained by the idea of using a pass to compute and communicate this information -- there are other approaches. I haven't thought about them because I haven't designed such an interface. If I had, I would just contribute it. I do think that this is the correct design direction though, as I don't think that there is one unified way to represent diagnostics in the backend, or if there is I don't tihnk we have any idea what it looks like. I wasn’t trying to describe a potential API. I was trying to figure out what you were proposing, and I don’t think adding new passes would work well at all. I can’t think of any reasonable way to design such an interface. Can you at least sketch out an example of what you are proposing? Without that, I think there’s really only one proposal on the table. > > Inline assembly is more of a special case than anything else. I don't really see why the right path forward is to design a generic framework around what was originally built for its specific purpose. If we’re going to add a generic framework, shouldn’t the minimum requirement be that it support the current functionality? > > If anything, I think the design you are pursuing is strictly more complex and invasive, and without any tangible benefits as a consequence. What complexity do you see? It seems very, very simple to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Sat Jul 20 12:28:18 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Sat, 20 Jul 2013 14:28:18 -0500 (CDT) Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: Message-ID: <1713481525.13245448.1374348498377.JavaMail.root@alcf.anl.gov> Hi Quentin, I have a particular use case that I'd like this infrastructure to support, and I'm not sure if your proposal captures this or not: hints from the autovectorization passes (and other loop optimizations). Specifically, I'd like to be able to relay messages like this: - "*This loop* cannot be optimized because the induction variable *i* is unsigned, and cannot be proved not to wrap" - "*This loop* cannot be vectorized because the compiler cannot prove that memory read from *a* does not overlap with memory written to through *b*" - "*This loop* cannot be vectorized because the compiler cannot prove that it is unconditionally safe to dereference the pointer *a*. and I'd like the frontend to get enough information through the interface to turn the starred items back into higher-level-language constructs (that way they can be turned into hyperlinks, underlined, or whatever). I think this might be possible by providing references to any associated debug metadata. Maybe we can define some simple markup, and have each marked item correspond to an entry in an array of metadata poitners? What do you think? Thanks, Hal ----- Original Message ----- > > > Hi, > > > As no new suggestions came in, I would like to share my plans to move > forward. > > > — Feedbacks are very welcome! -- > > > Like I summed up in my previous email (see below), there are two > orthogonal approaches to this problem. > I have chosen the second one (extend the reporting already used for > inline asm). > > > Regarding the first approach (front-end querying the back-end to > build up diagnostic), it needs further discussion if anyone wants to > go in that direction. Although I think this is interesting, I leave > that for someone else. > > > > > ** Proposed Plan ** > > > Extend what already exists: back-end reports to the front-end through > hooks. > > > The detail plan is the following (still hand wavy, I did not look > into the actual code base, just had some high level discussions): > 1. LLVM: Extend the current diagnostic hook to pass: > 1.1. A kind of diagnostic (InlineAsm, StackSize, Other ). > 1.2. A boolean hinting whether the diagnostic is an error or a > warning. > 2. Clang: Map the diagnostic to the proper (pre-defined) group based > on the passed kind. > > > > > ** Remarks on ** > > > 1.1. This is supposed to change rarely, that is why a static approach > should do the trick. The ‘Other’ kind represents a fall-back cases > for thing we did not anticipate (do not get too hung up on the > names, they have to be defined (name, which kind of back-end > diagnostic we want to expose, etc.)). > 1.2. Each front-end does not have clang capabilities, thus this > boolean will help the "front-end" to figure out what it should do > with that diagnostic (e.g., keep going or abort). Clang may ignore > them. > > > 2. Obviously, the kind of diagnostic has to be shared between Clang > and LLVM, this is the awkward part, but again, this should rarely > change. > > > > > Thanks again for all the contributions made so far and for the future > ones as well! > > > Cheers, > > > -Quentin > > > > On Jul 17, 2013, at 11:34 AM, Quentin Colombet < qcolombet at apple.com > > wrote: > > > > Hi, > > > Thanks all for the insight comments. > > > Let me sum up at a high level what proposals we actually have (sorry > if I misinterpreted or missed something, do not hesitate to correct > me): > > > 1. Make LLVM defines some APIs that exposes some internal information > so that a front-end can use them to build-up some diagnostics. > > > 2. Make LLVM builds up a diagnostic and let a front-end maps this > diagnostic to the right warning/error group. > > > > > In my opinion, both approaches are orthogonal and have different > advantages based on what goal we are pursuing. > > > To be more specific, with the first approach, front-end people can > come up with new warnings/errors and diagnostics without having to > modify the back-end. On the other hand, with the second approach, > back-end people can emit new warnings/errors without having to > modify the front-end (BTW, it would be nice if we come up at least > in clang with a consistent way to pass down options for emitting > back-end warning/error when applicable, i.e., without -mllvm but > with -W). > > > What people are thinking? > > > Thanks again for sharing your thoughts, I appreciate. > > > Cheers, > > > > -Quentin > > > On Jul 17, 2013, at 9:38 AM, Evan Cheng < evan.cheng at apple.com > > wrote: > > > > > > > Sent from my iPad > > On Jul 17, 2013, at 8:53 AM, Bob Wilson < bob.wilson at apple.com > > wrote: > > > > > > > On Jul 17, 2013, at 2:12 AM, Chandler Carruth < chandlerc at google.com > > wrote: > > > > On Tue, Jul 16, 2013 at 9:34 PM, Bob Wilson < bob.wilson at apple.com > > wrote: > > > > > > > > > > On Jul 16, 2013, at 5:51 PM, Eli Friedman < eli.friedman at gmail.com > > wrote: > > > > > On Tue, Jul 16, 2013 at 5:21 PM, Quentin Colombet < > qcolombet at apple.com > wrote: > > > > ** Advices Needed ** > > 1. Decide whether or not we want such capabilities (if we do not we > may just > add sporadically the support for a new warning/group of > warning/error). > 2. Come up with a plan to implement that (assuming we want it). > > The frontend should be presenting warnings, not the backend; adding a > hook which provides the appropriate information shouldn't be too > hard. > Warnings coming out of the backend are very difficult to design well, > so I don't expect we will add many. Also, keep in mind that the > information coming out of the backend could be used in other ways; it > might not make sense for the backend to decide that some piece of > information should be presented as a warning. (Consider, for example, > IDE integration to provide additional information about functions and > loops on demand.) > > I think we definitely need this. In fact, I tried adding something > simple earlier this year but gave up when I realized that the task > was bigger than I expected. We already have a hook for diagnostics > that can be easily extended to handle warnings as well as errors > (which is what I tried earlier), but the problem is that it is > hardwired for inline assembly errors. To do this right, new warnings > really need to be associated with warning groups so that can be > controlled from the front-end. > > > I agree with Eli that there probably won’t be too many of these. > Adding a few new entries to clang’s diagnostic .td files would be > fine, except that the backend doesn’t see those. It seems like we > will need something in llvm that defines a set of “backend > diagnostics”, along with a table in the frontend to correlate those > with the corresponding clang diagnostics. That seems awkward at best > but maybe it’s tolerable as long as there aren’t many of them. > > > I actually think this is the wrong approach, and I don't think it's > quite what Eli or I am suggestion (of course, Eli may want to > clarify, I'm only really clarifying what *I'm* suggesting. > > > I think all of the warnings should be in the frontend, using the > standard and existing machinery for generating, controlling, and > displaying a warning. We already know how to do that well. The > difference is that these warnings will need to query the LLVM layer > for detailed information through some defined API, and base the > warning on this information. This accomplishes two things: > > > 1) It ensures the warning machinery is simple, predictable, and > integrates cleanly with everything else in Clang. It does so in the > best way by simply being the existing machinery. > > > 2) It forces us to design reasonable APIs in LLVM to expose to a FE > for this information. A consequence of this will be to sort out the > layering issues, etc. Another consequence will be a strong chance of > finding general purpose APIs in LLVM that can serve many purposes, > not just a warning. Consider JITs and other systems that might > benefit from having good APIs for querying the size and makeup (at a > high level) of a generated function. > > > A nice side-effect is that it simplifies the complexity involved for > simple warnings -- now it merely is the complexity of exposing the > commensurately simple API in LLVM. If instead we go the route of > threading a FE interface for *reporting* warnings into LLVM, we have > to thread an interface with sufficient power to express many > different concepts. > > I don't understand what you are proposing. > > > First, let me try to clarify my proposal, in case there was any > confusion about that. LLVMContext already has a hook for > diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting > that we rename those interfaces to be more generic, add a simple > enumeration of whatever diagnostics can be produced from the > backend, and add support in clang for mapping those enumeration > values to the corresponding clang diagnostics. This would be a small > amount of work and would also be consistent with everything you > wrote above about reusing the standard and existing machinery for > diagnostics in clang. For the record, I had started down that path > in svn commits 171041 and 171047, but I reverted those changes in > 174748 and 174860, since they didn't go far enough to make it work > properly and it wasn't clear at the time whether we really needed > it. > > > Now let me try to understand what you're suggesting…. You refer > several times to having clang query the LLVM layer. Is this to > determine whether to emit a diagnostic for some condition? How would > this work? Would you have clang insert extra passes to check for > various conditions that might require diagnostics? I don't see how > else you would do it, since clang's interface to the backend just > sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses > pass managers and then runs them. Assuming you did add some special > passes to check for problems, wouldn't those passes have to > duplicate a lot of effort in some cases to find the answers? Take > for example the existing warnings in > IntrinsicLowering::LowerIntrinsicCall. Those badly need to be > cleaned up. Would clang run a special pass to check for intrinsics > that are not supported by the target? That pass would need to be > implemented as part of clang so that it would have access to clang's > diagnostic machinery, but it would also need to know details about > what intrinsics are supported by the target. Interesting layering > problems there…. Apologies if I'm misinterpreting your proposal. > > We can't assume clang is the frontend or design a system that only > works with clang. There are many systems that use llvm which are not > even c compilers. > > > Evan > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From chandlerc at google.com Sat Jul 20 15:55:10 2013 From: chandlerc at google.com (Chandler Carruth) Date: Sat, 20 Jul 2013 15:55:10 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 6:48 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > > Hi Rafael, > > > > Did this discussion ever get a conclusion? I support enabling > > pipefail. Fallout for out of tree users should be easy to fix. As we > > learned from LLVM tests, almost all tests that start to fail actually > > indicate a real problem that was hidden. > > So far I got some positive feedback, but no strong LGTM from someone > in the area :-( > Ok, here is a strong LGTM. =] Please make the change, and do the following things to aid out-of-tree maintainers: 1) Add a flag to lit and an option to configure/make (I don't care about CMake here as that is much less frequently used for out-of-tree work) to disable pipefail. 2) Add a way to disable this behavior using the lit '.cfg' files, especially the 'lit.local.cfg' files. This way out-of-tree targets can put such a .cfg file in their target's test directory and ignore this change. 3) Add some significant documentation for what this means to both the lit documentation and to the release notes so that folks are aware of the test infrastructure change when they pick up a giant pile of changes in the release Also, please send a note (in a new thread) to llvmdev when the switch is flipped with a reminder about how to disable the new behavior for folks that can't update their test suite. You'll probably want to flip the switch when you have time to track down lots of build bot failures. =D Thanks for making the test infrastructure better, -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: From clattner at apple.com Sat Jul 20 21:15:35 2013 From: clattner at apple.com (Chris Lattner) Date: Sat, 20 Jul 2013 21:15:35 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: Sorry, just getting caught up on an old thread. I haven't been involved in discussions of this. On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: > First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. Of all of the proposals discussed, I like this the best: 1) This is a really simple extension of what we already have. 2) The backend providing a set of enumerations for the classes of diagnostics it produces doesn't tie it to clang, and doesn't make it language specific. Clients should be able to completely ignore the enum if they want the current (unclassified) behavior, and if an unknown enum value comes through, it is easy to handle. 3) I don't see how something like the stack size diagnostic can be implemented by clang calling into the backend. First, the MachineFunction (and thus, MachineFrameInfo) is a transient datastructure used by the backend when a function is compiled. There is nothing persistent for clang to query. Second, clang would have to know about all of the LLVM IR functions generated, which is possible, but impractical to track for things like thunks and other implicitly generated entrypoints. What is the specific concern with this approach? I don't see how this couples the backend to the frontend or causes layering violation problems. -Chris From craig.topper at gmail.com Sun Jul 21 00:40:44 2013 From: craig.topper at gmail.com (Craig Topper) Date: Sun, 21 Jul 2013 00:40:44 -0700 Subject: [LLVMdev] fptoui calling a function that modifies ECX In-Reply-To: <51EA3FEF.8070708@uformia.com> References: <51E5F5ED.1000808@uformia.com> <51E8BBE9.8020407@uformia.com> <51E8CADA.1040506@uformia.com> <51E8CE40.3050703@uformia.com> <51E8D26B.3000108@uformia.com> <51E8D468.7030509@uformia.com> <51E8DE10.9090900@uformia.com> <51E8E123.2030402@uformia.com> <51E8E3D0.5040807@uformia.com> <51E8E994.3030400@uformia.com> <51E90826.3060201@uformia.com> <51EA3FEF.8070708@uformia.com> Message-ID: Committed in r186787 On Sat, Jul 20, 2013 at 12:44 AM, Peter Newman wrote: > I've applied this and the test cases I have here continue to work, so it > looks good to me. > > I've ran into another (seemingly unrelated) issue which I'll describe in a > separate email to the dev list. > > -- > Peter N > > > On 20/07/2013 5:30 AM, Craig Topper wrote: > > Here's my attempt at a fix. Adding Jakob to make sure I did this right. > > > On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman wrote: > >> That does appear to have worked. All my tests are passing now. >> >> I'll hand this out to our other devs & testers and make sure it's working >> for them as well (not just on my machine). >> >> Thank you, again. >> >> -- >> Peter N >> >> >> On 19/07/2013 5:45 PM, Craig Topper wrote: >> >> I don't think that's going to work. >> >> >> On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman wrote: >> >>> Thank you, I'm trying this now. >>> >>> >>> On 19/07/2013 5:23 PM, Craig Topper wrote: >>> >>> Try adding ECX to the Defs of this part of >>> lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a >>> Windows machine to test myself. >>> >>> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { >>> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP32:$src)]>, >>> Requires<[In32BitMode]>; >>> >>> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src), >>> "# win32 fptoui", >>> [(X86WinFTOL RFP64:$src)]>, >>> Requires<[In32BitMode]>; >>> } >>> >>> >>> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman wrote: >>> >>>> Oh, excellent point, I agree. My bad. Now that I'm not assuming those >>>> are the sqrt, I see the sqrtpd's in the output. Also there are three >>>> fptoui's and there are 3 call instances. >>>> >>>> (Changing subject line again.) >>>> >>>> Now it looks like it's bug #13862 >>>> >>>> On 19/07/2013 4:51 PM, Craig Topper wrote: >>>> >>>> I think those calls correspond to this >>>> >>>> %110 = fptoui double %109 to i32 >>>> >>>> The calls are followed by an imul with 12 which matches up with what >>>> occurs right after the fptoui in the IR. >>>> >>>> >>>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman wrote: >>>> >>>>> Yes, that is the result of module-dump.ll >>>>> >>>>> >>>>> On 19/07/2013 4:46 PM, Craig Topper wrote: >>>>> >>>>> Does this correspond to one of the .ll files you sent earlier? >>>>> >>>>> >>>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman wrote: >>>>> >>>>>> (Changing subject line as diagnosis has changed) >>>>>> >>>>>> I'm attaching the compiled code that I've been getting, both with >>>>>> CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with >>>>>> CodeGenOpt::None, but that seems to be because ECX isn't being used - it >>>>>> still gets set to 0x7fffffff by one of the calls to 76719BA1 >>>>>> >>>>>> I notice that X86::SQRTPD[m|r] appear in >>>>>> X86InstrInfo::isHighLatencyDef. I was thinking an optimization might be >>>>>> removing it, but I don't get the sqrtpd instruction even if the createJIT >>>>>> optimization level turned off. >>>>>> >>>>>> I am trying this with the Release 3.3 code - I'll try it with trunk >>>>>> and see if I get a different result there. Maybe there was a recent commit >>>>>> for this. >>>>>> >>>>>> -- >>>>>> Peter N >>>>>> >>>>>> On 19/07/2013 4:00 PM, Craig Topper wrote: >>>>>> >>>>>> Hmm, I'm not able to get those .ll files to compile if I disable SSE >>>>>> and I end up with SSE instructions(including sqrtpd) if I don't disable it. >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman wrote: >>>>>> >>>>>>> Is there something specifically required to enable SSE? If it's >>>>>>> not detected as available (based from the target triple?) then I don't >>>>>>> think we enable it specifically. >>>>>>> >>>>>>> Also it seems that it should handle converting to/from the vector >>>>>>> types, although I can see it getting confused about needing to do that if >>>>>>> it thinks SSE isn't available at all. >>>>>>> >>>>>>> >>>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote: >>>>>>> >>>>>>> Hmm, maybe sse isn't being enabled so its falling back to emulating >>>>>>> sqrt? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter Newman wrote: >>>>>>> >>>>>>>> In the disassembly, I'm seeing three cases of >>>>>>>> call 76719BA1 >>>>>>>> >>>>>>>> I am assuming this is the sqrt function as this is the only >>>>>>>> function called in the LLVM IR. >>>>>>>> >>>>>>>> The code at 76719BA1 is: >>>>>>>> >>>>>>>> 76719BA1 push ebp >>>>>>>> 76719BA2 mov ebp,esp >>>>>>>> 76719BA4 sub esp,20h >>>>>>>> 76719BA7 and esp,0FFFFFFF0h >>>>>>>> 76719BAA fld st(0) >>>>>>>> 76719BAC fst dword ptr [esp+18h] >>>>>>>> 76719BB0 fistp qword ptr [esp+10h] >>>>>>>> 76719BB4 fild qword ptr [esp+10h] >>>>>>>> 76719BB8 mov edx,dword ptr [esp+18h] >>>>>>>> 76719BBC mov eax,dword ptr [esp+10h] >>>>>>>> 76719BC0 test eax,eax >>>>>>>> 76719BC2 je 76719DCF >>>>>>>> 76719BC8 fsubp st(1),st >>>>>>>> 76719BCA test edx,edx >>>>>>>> 76719BCC js 7671F9DB >>>>>>>> 76719BD2 fstp dword ptr [esp] >>>>>>>> 76719BD5 mov ecx,dword ptr [esp] >>>>>>>> 76719BD8 add ecx,7FFFFFFFh >>>>>>>> 76719BDE sbb eax,0 >>>>>>>> 76719BE1 mov edx,dword ptr [esp+14h] >>>>>>>> 76719BE5 sbb edx,0 >>>>>>>> 76719BE8 leave >>>>>>>> 76719BE9 ret >>>>>>>> >>>>>>>> >>>>>>>> As you can see at 76719BD5, it modifies ECX . >>>>>>>> >>>>>>>> I don't know that this is the sqrtpd function (for example, I'm not >>>>>>>> seeing any SSE instructions here?) but whatever it is, it's being called >>>>>>>> from the IR I attached earlier, and is modifying ECX under some >>>>>>>> circumstances. >>>>>>>> >>>>>>>> >>>>>>>> On 19/07/2013 3:29 PM, Craig Topper wrote: >>>>>>>> >>>>>>>> That should map directly to sqrtpd which can't modify ecx. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 10:27 PM, Peter Newman wrote: >>>>>>>> >>>>>>>>> Sorry, that should have been llvm.x86.sse2.sqrt.pd >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19/07/2013 3:25 PM, Craig Topper wrote: >>>>>>>>> >>>>>>>>> What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things >>>>>>>>> prefixed with "llvm.x86". >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman wrote: >>>>>>>>> >>>>>>>>>> After stepping through the produced assembly, I believe I have >>>>>>>>>> a culprit. >>>>>>>>>> >>>>>>>>>> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value >>>>>>>>>> of ECX - while the produced code is expecting it to still contain its >>>>>>>>>> previous value. >>>>>>>>>> >>>>>>>>>> Peter N >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 19/07/2013 2:09 PM, Peter Newman wrote: >>>>>>>>>> >>>>>>>>>> I've attached the module->dump() that our code is producing. >>>>>>>>>> Unfortunately this is the smallest test case I have available. >>>>>>>>>> >>>>>>>>>> This is before any optimization passes are applied. There are two >>>>>>>>>> separate modules in existence at the time, and there are no guarantees >>>>>>>>>> about the order the surrounding code calls those functions, so there may be >>>>>>>>>> some interaction between them? There shouldn't be, they don't refer to any >>>>>>>>>> common memory etc. There is no multi-threading occurring. >>>>>>>>>> >>>>>>>>>> The function in module-dump.ll (called crashfunc in this file) is >>>>>>>>>> called with >>>>>>>>>> - func_params 0x0018f3b0 double [3] >>>>>>>>>> [0x0] -11.339976634695301 double >>>>>>>>>> [0x1] -9.7504239056205506 double >>>>>>>>>> [0x2] -5.2900856817382804 double >>>>>>>>>> at the time of the exception. >>>>>>>>>> >>>>>>>>>> This is compiled on a "i686-pc-win32" triple. All of the >>>>>>>>>> non-intrinsic functions referred to in these modules are the standard >>>>>>>>>> equivalents from the MSVC library (e.g. @asin is the standard C lib >>>>>>>>>> double asin( double ) ). >>>>>>>>>> >>>>>>>>>> Hopefully this is reproducible for you. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> PeterN >>>>>>>>>> >>>>>>>>>> On 18/07/2013 4:37 PM, Craig Topper wrote: >>>>>>>>>> >>>>>>>>>> Are you able to send any IR for others to reproduce this issue? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Unfortunately, this doesn't appear to be the bug I'm hitting. I >>>>>>>>>>> applied the fix to my source and it didn't make a difference. >>>>>>>>>>> >>>>>>>>>>> Also further testing found me getting the same behavior with >>>>>>>>>>> other SIMD instructions. The common factor is in each case, ECX is set to >>>>>>>>>>> 0x7fffffff, and it's an operation using xmm ptr ecx+offset . >>>>>>>>>>> >>>>>>>>>>> Additionally, turning the optimization level passed to createJIT >>>>>>>>>>> down appears to avoid it, so I'm now leaning towards a bug in one of the >>>>>>>>>>> optimization passes. >>>>>>>>>>> >>>>>>>>>>> I'm going to dig through the passes controlled by that parameter >>>>>>>>>>> and see if I can narrow down which optimization is causing it. >>>>>>>>>>> >>>>>>>>>>> Peter N >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >>>>>>>>>>> >>>>>>>>>>>> As someone off list just told me, perhaps my new bug is the >>>>>>>>>>>> same issue: >>>>>>>>>>>> >>>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>>>>>>>>>>> >>>>>>>>>>>> Do you happen to be using FastISel? >>>>>>>>>>>> >>>>>>>>>>>> Solomon >>>>>>>>>>>> >>>>>>>>>>>> On Jul 16, 2013, at 6:39 PM, Peter Newman >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm currently in the process of debugging a crash occurring in >>>>>>>>>>>>> our program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>>>>>>>>>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>>>>>>>>>>> However this only happens under certain conditions that seem (but may not >>>>>>>>>>>>> be) related to the stacks state on calling the function. >>>>>>>>>>>>> >>>>>>>>>>>>> Our program acts as a front-end, using the LLVM C++ API to >>>>>>>>>>>>> generate a JIT generated function. This function is primarily mathematical, >>>>>>>>>>>>> so we use the Vector types to take advantage of SIMD instructions (as well >>>>>>>>>>>>> as a few SSE2 intrinsics). >>>>>>>>>>>>> >>>>>>>>>>>>> This worked in LLVM 2.8 but started failing in 3.2 and has >>>>>>>>>>>>> continued to fail in 3.3. It fails with no optimizations applied to the >>>>>>>>>>>>> LLVM Function/Module. It crashes with what is reported as a memory access >>>>>>>>>>>>> error (accessing 0xffffffff), however it's suggested that this is how the >>>>>>>>>>>>> SSE fault raising mechanism appears. >>>>>>>>>>>>> >>>>>>>>>>>>> The generated instruction varies, but it seems to often be >>>>>>>>>>>>> similar to (I don't have it in front of me, sorry): >>>>>>>>>>>>> movapd xmm0, xmm[ecx+0x???????] >>>>>>>>>>>>> Where the xmm register changes, and the second parameter is a >>>>>>>>>>>>> memory access. >>>>>>>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this >>>>>>>>>>>>> is part of the SSE error reporting process or is part of the situation >>>>>>>>>>>>> causing the error. >>>>>>>>>>>>> >>>>>>>>>>>>> I haven't worked out exactly what code path etc is causing >>>>>>>>>>>>> this crash. I'm hoping that someone can tell me if there were any changed >>>>>>>>>>>>> requirements for working with SIMD in LLVM 3.2 (or earlier, we haven't >>>>>>>>>>>>> tried 3.0 or 3.1). I currently suspect the use of GlobalVariable (we first >>>>>>>>>>>>> discovered the crash when using a feature that uses them), however I have >>>>>>>>>>>>> attempted using setAlignment on the GlobalVariables without any change. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Peter N >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LLVM Developers mailing list >>>>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Craig >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~Craig >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Craig >>>> >>>> >>>> >>> >>> >>> -- >>> ~Craig >>> >>> >>> >> >> >> -- >> ~Craig >> >> >> > > > -- > ~Craig > > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkh159 at gmail.com Sun Jul 21 03:30:50 2013 From: mkh159 at gmail.com (m kh) Date: Sun, 21 Jul 2013 15:00:50 +0430 Subject: [LLVMdev] error on compiling toy-vm Message-ID: Hi all, The make command errors out: [toyVM ./tools/toyVM]: Generating frame tables initializer for .build/toyVM-binary.s [toyVM ./tools/toyVM]: Compiling .build/GenFrametables.cc [toyVM ./tools/toyVM]: Linking ../../Release/bin/toyVM clang: error: no such file or directory: '/home/user/vmkit/Release+Asserts/lib/Release/lib/libInlineMMTk.a' make[2]: *** [../../Release/bin/toyVM] Error 1 make[2]: Leaving directory `/home/user/vmkit/www/tuto/toy-vm2/tools/toyVM' make[1]: *** [all-subs] Error 2 make[1]: Leaving directory `/home/user/vmkit/www/tuto/toy-vm2/tools' make: *** [all-subs] Error 2 Best regards, Mkh -------------- next part -------------- An HTML attachment was scrubbed... URL: From pichet2000 at gmail.com Sun Jul 21 07:29:47 2013 From: pichet2000 at gmail.com (Francois Pichet) Date: Sun, 21 Jul 2013 10:29:47 -0400 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: Message-ID: Ok any quick workaround to limit vectorization to 16-byte aligned 128-bit data then? All the memory copying done by ExpandUnalignedStore/ExpandUnalignedLoad is just too expensive. On Sat, Jul 20, 2013 at 12:52 PM, Arnold Schwaighofer < aschwaighofer at apple.com> wrote: > > On Jul 19, 2013, at 3:14 PM, Francois Pichet wrote: > > > > > What is the proper solution to disable auto-vectorization for unaligned > data? > > > > I have an out of tree target and I added this: > > > > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool > *Fast) const { > > if (VT.isVector()) > > return false; > > .... > > } > > > > After that, I could see that vectorization is still done on unaligned > data except that llvm will copy the data back and forth from the source to > the top of the stack and work from there. This is very costly, I rather get > scalar operations. > > > > Then I tried to add: > > unsigned getMemoryOpCost(unsigned Opcode, Type *Src, > > unsigned Alignment, > > unsigned AddressSpace) const { > > if (Src->isVectorTy() && Alignment != 16) > > return 10000; // <== high number to try to avoid unaligned > load/store. > > return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, > AddressSpace); > > } > > > > Except that this doesn't work because Alignment will always be 4 even > for data like: > > int data[16][16] __attribute__ ((aligned (16))), > > > > Because individual element are still 4-byte aligned. > > We will have to hook up some logic in the loop vectorizer that computes > the alignment of the vectorized version of the memory access so that we can > pass it to “getMemoryOpCost". Currently, as you have observed, we will just > pass the scalar loop’s memory access alignment which will be pessimistic. > > Instcombine will later replace the alignment to a stronger variant for > vectorized code but that is obviously to late for the cost model in the > vectorizer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Sun Jul 21 09:49:47 2013 From: tanmx_star at yeah.net (Star Tan) Date: Mon, 22 Jul 2013 00:49:47 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51E31F3E.3080501@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> Message-ID: Hi all, I have attached a patch file to reduce the polly-detect overhead. My idea is to avoid calling TypeFinder in Non-DEBUG mode, so TypeFinder is only called in DEBUG mode with the DEBUG macro. This patch file did this work with following modifications: First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); would be converted into: LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); Second, it simplifies some complex operations like: INVALID(AffFunc, "Non affine branch in BB '" << BB.getName() << "' with LHS: " << *LHS << " and RHS: " << *RHS); into: LastFailure = "Non affine branch in BB: " + BB.getName().str(); In such cases, some information for "LHS" and "RHS" are missed. However, it only has little affects on the variable "LastFailure", while keeping all information for DEBUG information. Since the variable "LastFailure" is only used in ScopGraphPrinter, which should only show critical information in graph, I hope such modification is acceptable. Results show that it has almost the same performance improvements as my previous hack patch file, i.e., reducing the compile-time of Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc 16X. Best wishes, Star Tan Postscript to Tobias: I have also implemented your initial proposal, which uses some class hierarchy to show different Failure information. Unfortunately, I found it would significantly complicate the source code. It not only introduces a lot of dynamic object "new" and "copy" operations, but also makes source code hard to understand. I argue that very detailed information for the variable "LastFailure" is not essential because it should only show critical information in the graph. If users want to know detailed failure information, they should refer to DEBUG information. This new patch file keeps most of critical information for "LastFailure" except some detailed information about Instruction and SCEV. Do you have any suggestion? At 2013-07-15 05:59:26,"Tobias Grosser" wrote: >On 07/14/2013 08:05 AM, Star Tan wrote: >> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. >> Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. > >Very nice analysis. I just tried it myself and can verify that for >oggenc 16x, your patch reduces the scop-detection time from 90 seconds >(80 %) to 0.5 seconds (2.5 %). > >I think there are two problems: > > 1) The cost of printing a single LLVM type/value increases with > the size of the overall Module. This seems to be because > TypeFinder::run() is called each time, without caching in place. > The cost of TypeFinder::run() increases with the size of the > module, as it basically just performs a scan on the entire Module. > > 2) We are formatting the failure messages during normal compilation, > even though they are only used in debugging tools like -view-scops > >In terms of solutions: > >It would be interesting to understand why is 1) so slow, especially as >it seems to be either a fundamental problem in LLVM IR printing or the >way we use the IR printing infrastructure. On the other side, for Polly >we need to solve 2) anyway. Even if formatting would be faster, we >should still not do it, if not needed. As we need to solve 2) anyway, 1) >will only hit us when we do debugging/formatting. I assume in case of >debugging the files we are looking into are normally smaller, such that >the formatting overhead will not be that big. > >Hence, I would focus on 2). We could probably just put the code under a >NDEBUG ifndef, but I would actually like to keep them available even in >NDEBUG mode, as we may want to use the error messages to hint users to >why their code can not be optimized. For this and also to get rid of >another annoyance, the INVALID macro, I think we need to restructure the >reporting of the last error, such that formatting of the error messages >can be done on-demand. Another problem that could be solved at this >point is to remove the macro use, which hides the fact that the >functions return as soon as INVALID is called, which is plainly ugly. > >I am not sure how to structure this, but I could imagine some small >class hierarchy that has a class for each error type. Each class just >stores pointers to the data structures it needs to format its error >message, but only formats the error on-demand. We could then return this >class in case of failure and return a NoError class or a NULL pointer in >case of success. > >This change may also help us to later add support to keep track of all >errors we encounter (not just the first one). This is something Andreas >and Johannes found helpful earlier. > >Cheers, >Tobias > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-ScopDetect-Optimize-compile-time-cost-for-string-ope.patch Type: application/octet-stream Size: 16775 bytes Desc: not available URL: From aschwaighofer at apple.com Sun Jul 21 10:16:40 2013 From: aschwaighofer at apple.com (Arnold Schwaighofer) Date: Sun, 21 Jul 2013 12:16:40 -0500 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: Message-ID: <11687713-FD37-4A16-AC98-4C087463D2C8@apple.com> No, I am afraid not without computing alignment based on the scalar code. In order to limit vectorization to 16-byte aligned data we need to know that data is 16-byte aligned. The way we vectorize we won’t know that until after we have vectorized. As you have observed we will pass “4” to getMemoryOpCost in the loop vectorizer (as that is the only thing that can be inferred from a consecutive scalar access like “aligned_ptr += 32bit”). scalar code -> estimate cost based on scalar instructions -> vectorize -> vectorized code -> ... -> instcombine (calls ComputeMaskedBits) which computes better alignment for pointer accesses like “aligned_ptr += 128bit”. I will have to work on this soon as ARM also has pretty inefficient unaligned vector loads. On Jul 21, 2013, at 9:29 AM, Francois Pichet wrote: > Ok any quick workaround to limit vectorization to 16-byte aligned 128-bit data then? > > All the memory copying done by ExpandUnalignedStore/ExpandUnalignedLoad is just too expensive. > > > On Sat, Jul 20, 2013 at 12:52 PM, Arnold Schwaighofer wrote: > > On Jul 19, 2013, at 3:14 PM, Francois Pichet wrote: > > > > > What is the proper solution to disable auto-vectorization for unaligned data? > > > > I have an out of tree target and I added this: > > > > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const { > > if (VT.isVector()) > > return false; > > .... > > } > > > > After that, I could see that vectorization is still done on unaligned data except that llvm will copy the data back and forth from the source to the top of the stack and work from there. This is very costly, I rather get scalar operations. > > > > Then I tried to add: > > unsigned getMemoryOpCost(unsigned Opcode, Type *Src, > > unsigned Alignment, > > unsigned AddressSpace) const { > > if (Src->isVectorTy() && Alignment != 16) > > return 10000; // <== high number to try to avoid unaligned load/store. > > return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); > > } > > > > Except that this doesn't work because Alignment will always be 4 even for data like: > > int data[16][16] __attribute__ ((aligned (16))), > > > > Because individual element are still 4-byte aligned. > > We will have to hook up some logic in the loop vectorizer that computes the alignment of the vectorized version of the memory access so that we can pass it to “getMemoryOpCost". Currently, as you have observed, we will just pass the scalar loop’s memory access alignment which will be pessimistic. > > Instcombine will later replace the alignment to a stronger variant for vectorized code but that is obviously to late for the cost model in the vectorizer. > > From tobias at grosser.es Sun Jul 21 10:40:31 2013 From: tobias at grosser.es (Tobias Grosser) Date: Sun, 21 Jul 2013 10:40:31 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> Message-ID: <51EC1D0F.6000800@grosser.es> On 07/21/2013 09:49 AM, Star Tan wrote: > Hi all, > > > I have attached a patch file to reduce the polly-detect overhead. Great. > My idea is to avoid calling TypeFinder in Non-DEBUG mode, so > TypeFinder is only called in DEBUG mode with the DEBUG macro. This > patch file did this work with following modifications: > > > First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: > INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); > would be converted into: > LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); > > > Second, it simplifies some complex operations like: > INVALID(AffFunc, > "Non affine branch in BB '" << BB.getName() << "' with LHS: " > << *LHS << " and RHS: " << *RHS); > into: > LastFailure = "Non affine branch in BB: " + BB.getName().str(); > In such cases, some information for "LHS" and "RHS" are missed. Yes. And unfortunately, we may also loose the 'name' of unnamed basic blocks. Basic blocks without a name get formatted as %111 with '111' being their sequence number. Your above change will not be able to derive this number. > However, it only has little affects on the variable "LastFailure", > while keeping all information for DEBUG information. Why is that? It seems the DEBUG output is the very same that gets written to "LastFailure". > Since the > variable "LastFailure" is only used in ScopGraphPrinter, which should > only show critical information in graph, I hope such modification is > acceptable. Why should we only show critical information? In the GraphPrinter we do not worry about compile time so much, such that we can easily show helpful information. We just need to make sure that we do not slow down the compile-time in the generic case. > Results show that it has almost the same performance improvements as > my previous hack patch file, i.e., reducing the compile-time of > Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc > 16X. Great. > Postscript to Tobias: > I have also implemented your initial proposal, which uses some class > hierarchy to show different Failure information. Unfortunately, I > found it would significantly complicate the source code. It not only > introduces a lot of dynamic object "new" and "copy" operations, but > also makes source code hard to understand. I argue that very > detailed information for the variable "LastFailure" is not essential > because it should only show critical information in the graph. If > users want to know detailed failure information, they should refer > to DEBUG information. This new patch file keeps most of critical > information for "LastFailure" except some detailed information about > Instruction and SCEV. Interesting. This was also something I was afraid of. Passing new/delete stuff through the scop detection is probably not what we want. > Do you have any suggestion? I do. Your patch goes in the right direction and it helped to get a better idea of what should be done. I think the first point that we learned is that passing class pointers around is probably too distrubing. I also believe having the formatting of the error messages in the normal scop detection is not great, as it both adds unrelated stuff to the code, but also makes it harder to conditionally disable the error reporting. On the other side, I believe it is good to make the 'return false' explicit. Hence, I propose to transform the code in something like the following: Instead of if (checkSomething()) INVALID(AffFunc, "Test" << SCEV <<); we should get something like: if (checkSomething()) { reportInvalidAffFunc(SCEV); return false; } The reportInvalidAffFunc is then either a NO-OP (during normal execution) or it reports the error (in case we need it). I am not yet fully sure how the reportInvalid* functions should look like. However, the part of your patch that is easy to commit without further discussion is to move the 'return false' out of the INVALID macro. Meaning, translating the above code to: if (checkSomething()) { INVALID(AffFunc, "Test" << SCEV <<); return false; } I propose to submit such a patch first and then focus on the remaining problems. Cheers, Tobias From g.franceschetti at vidya.it Sun Jul 21 12:32:01 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti) Date: Sun, 21 Jul 2013 21:32:01 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 Message-ID: <51EC3731.1040104@vidya.it> Hi all, I'm new to Clang and LLVM and I'd like to use them on Win 8 with Code::Blocks. I'm having problems in running cmake. I did the following: * Installed cmake * installed Code::Blocks * Installed python (cmake was complaining if it was not installed) * Dowloaded sources fron svn (LLVM, clang, compilre-rt and test-suite. When I run cmake I got The following error: *\build>CMake -G "CodeBlocks - MinGW Makefiles" ..\llvm* /-- Could NOT find LibXml2 (missing: LIBXML2_LIBRARIES LIBXML2_INCLUDE_DIR)// //-- Target triple: x86_64-w64-mingw32// //-- Native target architecture is X86// //-- Threads enabled.// //-- Found PythonInterp: C:/Python33/python.exe (found version "3.3.2")// //-- Constructing LLVMBuild project information// //CMake Error at CMakeLists.txt:299 (message):// // Unexpected failure executing llvm-build: Traceback (most recent call last):// // // File "/llvm/utils/llvm-b// //uild/llvm-build", line 3, in // // import llvmbuild// // File "\llvm\utils\llvm-b// //uild\llvmbuild\__init__.py", line 1, in // // from main import main// // // ImportError: No module named 'main'// // // //-- Configuring incomplete, errors occurred!/ So I thought that I missed to configure something. I tried to run the configure script as per the web instructions, but I got an error because the configure script is only for linux *\build>..\llvm\configure --p**refix="c:\llvm\tools"*/ //"..\llvm\configure" is not an internal/external command, an executable or batch file./// May anyone please help me? I downloaded the last version of LLVM (3.4). Thanks in advance, Giorgio Franceschetti -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Sun Jul 21 14:03:19 2013 From: renato.golin at linaro.org (Renato Golin) Date: Sun, 21 Jul 2013 22:03:19 +0100 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: <11687713-FD37-4A16-AC98-4C087463D2C8@apple.com> References: <11687713-FD37-4A16-AC98-4C087463D2C8@apple.com> Message-ID: If I got you right, this is the classic case for loop peeling. I thought LLVM's vectorizer had something like that already in. On 21 July 2013 18:16, Arnold Schwaighofer wrote: > I will have to work on this soon as ARM also has pretty inefficient > unaligned vector loads. > NEON does support unaligned access via VLD*/VST*, what loads are you referring to? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From ofv at wanadoo.es Sun Jul 21 14:51:27 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Sun, 21 Jul 2013 23:51:27 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 References: <51EC3731.1040104@vidya.it> Message-ID: <874nbnk2e8.fsf@wanadoo.es> Giorgio Franceschetti writes: > When I run cmake I got The following error: > *\build>CMake -G "CodeBlocks - MinGW Makefiles" ..\llvm* > /-- Could NOT find LibXml2 (missing: LIBXML2_LIBRARIES > LIBXML2_INCLUDE_DIR)// > //-- Target triple: x86_64-w64-mingw32// > //-- Native target architecture is X86// > //-- Threads enabled.// > //-- Found PythonInterp: C:/Python33/python.exe (found version "3.3.2")// > //-- Constructing LLVMBuild project information// > //CMake Error at CMakeLists.txt:299 (message):// > // Unexpected failure executing llvm-build: Traceback (most recent > call last):// I think that you installed the wrong version of Python. IIRC llvm-build requires Python 2.X From dwiberg at gmail.com Sun Jul 21 16:19:28 2013 From: dwiberg at gmail.com (David Wiberg) Date: Mon, 22 Jul 2013 01:19:28 +0200 Subject: [LLVMdev] Inst field in MSP430InstrFormats.td Message-ID: Hello, Within the file "MSP430InstrFormats.td" there is a class called "MSP430Inst" which has "Instruction" as superclass. Within this class there is a field called "Inst" (field bits<16> Inst;) which gets assigned in classes which specifies a specific instruction format, e.g. IForm contains: let Inst{12-15} = opcode; let Inst{7} = ad.Value; let Inst{6} = bw; let Inst{4-5} = as.Value; >From what I can see these values does not propagate to the files generated by TableGen. I have tried removing the field and it is at least still possible to run llc on a simple input file. This leads to a couple of questions: 1. Is this field redundant or does it have some use which I have missed? 2. What is the status of the MSP430 backend? In a mail from 2012 (http://lists.ransford.org/pipermail/llvm-msp430/2012-February/000194.html) Anton indicates that it is usable but is it a good basis for backend studies? My goal is to implement support for a new target and one step on the way is to fully understand how one of the existing backends work. Thanks David From aschwaighofer at apple.com Sun Jul 21 16:37:07 2013 From: aschwaighofer at apple.com (Arnold) Date: Sun, 21 Jul 2013 18:37:07 -0500 Subject: [LLVMdev] Disable vectorization for unaligned data In-Reply-To: References: <11687713-FD37-4A16-AC98-4C087463D2C8@apple.com> Message-ID: On Jul 21, 2013, at 4:03 PM, Renato Golin wrote: > If I got you right, this is the classic case for loop peeling. I thought LLVM's vectorizer had something like that already in. No we don't have loop peeling. The problem is even more fundamental than this. In the vectorizer we pass the alignment of the scalar loop access which is of course lower than what is required.we need to compute alignment based on the first access only and the vector access size. But we don't to this at the moment. > > On 21 July 2013 18:16, Arnold Schwaighofer wrote: >> I will have to work on this soon as ARM also has pretty inefficient unaligned vector loads. > > NEON does support unaligned access via VLD*/VST*, what loads are you referring to? Yes but they can be very slow depending on the alignment( more micro ops). > > cheers, > --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnk at google.com Sun Jul 21 17:11:17 2013 From: rnk at google.com (Reid Kleckner) Date: Sun, 21 Jul 2013 20:11:17 -0400 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <874nbnk2e8.fsf@wanadoo.es> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> Message-ID: On Sun, Jul 21, 2013 at 5:51 PM, Óscar Fuentes wrote: > Giorgio Franceschetti writes: > > > When I run cmake I got The following error: > > *\build>CMake -G "CodeBlocks - MinGW Makefiles" ..\llvm* > > /-- Could NOT find LibXml2 (missing: LIBXML2_LIBRARIES > > LIBXML2_INCLUDE_DIR)// > > //-- Target triple: x86_64-w64-mingw32// > > //-- Native target architecture is X86// > > //-- Threads enabled.// > > //-- Found PythonInterp: C:/Python33/python.exe (found version "3.3.2")// > > //-- Constructing LLVMBuild project information// > > //CMake Error at CMakeLists.txt:299 (message):// > > // Unexpected failure executing llvm-build: Traceback (most recent > > call last):// > > I think that you installed the wrong version of Python. IIRC llvm-build > requires Python 2.X > There was a patch on the commit list to try to make some of our scripts work for both 2 and 3. I should dig it up and review it. My initial impression was that still probably nobody uses python 3, so it's not worth adding support that will break. But if users actually have python 3, maybe it's worth it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Sun Jul 21 17:56:32 2013 From: atrick at apple.com (Andrew Trick) Date: Sun, 21 Jul 2013 17:56:32 -0700 Subject: [LLVMdev] Does nounwind have semantics? Message-ID: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... int32_t foo(int32_t* ptr) { int i = 0; int result; do { bar(ptr); result = *ptr; bar(ptr); } while (i++ < *ptr); return result; } Say we have a front end that declares bar as... declare void @bar(i32*) readonly; So 'bar' is 'readonly' and 'may-unwind'. When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. -Andy From eli.friedman at gmail.com Sun Jul 21 18:37:14 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Sun, 21 Jul 2013 18:37:14 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> Message-ID: On Sun, Jul 21, 2013 at 5:56 PM, Andrew Trick wrote: > Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... > > int32_t foo(int32_t* ptr) { > int i = 0; > int result; > do { > bar(ptr); > result = *ptr; > bar(ptr); > } while (i++ < *ptr); > return result; > } > > Say we have a front end that declares bar as... > > declare void @bar(i32*) readonly; > > So 'bar' is 'readonly' and 'may-unwind'. It's impossible to write a readonly function which throws a DWARF exception. Given that, I would imagine there are all sorts of bugs nobody has ever run into. -Eli From ofv at wanadoo.es Sun Jul 21 18:38:46 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Mon, 22 Jul 2013 03:38:46 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> Message-ID: <87y58zidax.fsf@wanadoo.es> Reid Kleckner writes: > My initial impression was that still probably nobody uses python 3, so it's > not worth adding support that will break. But if users actually have > python 3, maybe it's worth it. I think that on this case the problem was not people who actually have python 3, but people who see Python as a requirement for building LLVM and go to python.org and download the "most recent" version, i.e. python 3, because they are unaware of the incompatibilities. Believe it or not, there are developers who don't know about the Python mess :-) If adding support for version 3 is problematic, a check that gives a helpful message would be a good start. If it can't be implemented on the python scripts, it could be implemented on the cmake/configure scripts. BTW, http://llvm.org/docs/GettingStarted.html mentions Python as a requirement for the automated test suite (not for the build.) Says version >=2.4. A user reading that would assume that version 3.X is ok, or no Python at all if he only wishes to play with LLVM. From tanmx_star at yeah.net Sun Jul 21 19:33:48 2013 From: tanmx_star at yeah.net (Star Tan) Date: Mon, 22 Jul 2013 10:33:48 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51EC1D0F.6000800@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> Message-ID: <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> At 2013-07-22 01:40:31,"Tobias Grosser" wrote: >On 07/21/2013 09:49 AM, Star Tan wrote: >> Hi all, >> >> >> I have attached a patch file to reduce the polly-detect overhead. > >Great. > >> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so >> TypeFinder is only called in DEBUG mode with the DEBUG macro. This >> patch file did this work with following modifications: >> >> >> First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: >> INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); >> would be converted into: >> LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); >> >> >> Second, it simplifies some complex operations like: >> INVALID(AffFunc, >> "Non affine branch in BB '" << BB.getName() << "' with LHS: " >> << *LHS << " and RHS: " << *RHS); >> into: >> LastFailure = "Non affine branch in BB: " + BB.getName().str(); > > >> In such cases, some information for "LHS" and "RHS" are missed. > >Yes. And unfortunately, we may also loose the 'name' of unnamed basic >blocks. Basic blocks without a name get formatted as %111 with '111' >being their sequence number. Your above change will not be able to >derive this number. Yes, but it is much cheaper by using BB.getName().str(). Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder. > >> However, it only has little affects on the variable "LastFailure", >> while keeping all information for DEBUG information. > >Why is that? It seems the DEBUG output is the very same that gets >written to "LastFailure". No, they are different. For example, the code like this: INVALID(AffFunc, "Non affine branch in BB '" << BB.getName() << "' with LHS: " << *LHS << " and RHS: " << *RHS); would be converted to: LastFailure = "Non affine branch in BB: " + BB.getName().str(); INVALID(AffFunc, LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS); You see, information about LHS and RHS would be kept in INVALID DEBUG information. To keep the information about unnamed basic blocks, I think we can rewrite it as: FailString = "Non affine branch in BB: "; INVALID(AffFunc, FailString << BB.getName() << "' with LHS: " << *LHS << " and RHS: " << *RHS); LastFailure = FailString + BB.getName().str(); > >> Since the >> variable "LastFailure" is only used in ScopGraphPrinter, which should >> only show critical information in graph, I hope such modification is >> acceptable. > >Why should we only show critical information? In the GraphPrinter we do >not worry about compile time so much, such that we can easily show >helpful information. We just need to make sure that we do not slow down >the compile-time in the generic case. To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case? If that is true, I think we should control those operations for "LastFailure" as follows: if (In GraphPrinter mode) { LastFailure = ... } In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode. > >> Results show that it has almost the same performance improvements as >> my previous hack patch file, i.e., reducing the compile-time of >> Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc >> 16X. > >Great. > >> Postscript to Tobias: >> I have also implemented your initial proposal, which uses some class >> hierarchy to show different Failure information. Unfortunately, I >> found it would significantly complicate the source code. It not only >> introduces a lot of dynamic object "new" and "copy" operations, but >> also makes source code hard to understand. I argue that very >> detailed information for the variable "LastFailure" is not essential >> because it should only show critical information in the graph. If >> users want to know detailed failure information, they should refer >> to DEBUG information. This new patch file keeps most of critical >> information for "LastFailure" except some detailed information about >> Instruction and SCEV. > >Interesting. This was also something I was afraid of. Passing new/delete >stuff through the scop detection is probably not what we want. > >> Do you have any suggestion? > >I do. > >Your patch goes in the right direction and it helped to get a better >idea of what should be done. I think the first point that we learned is >that passing class pointers around is probably too distrubing. I also >believe having the formatting of the error messages in the normal scop >detection is not great, as it both adds unrelated stuff to the code, but >also makes it harder to conditionally disable the error reporting. On >the other side, I believe it is good to make the 'return false' >explicit. > >Hence, I propose to transform the code in something like the following: > >Instead of > > if (checkSomething()) > INVALID(AffFunc, "Test" << SCEV <<); > >we should get something like: > > if (checkSomething()) { > reportInvalidAffFunc(SCEV); > return false; > } > >The reportInvalidAffFunc is then either a NO-OP (during normal >execution) or it reports the error (in case we need it). What do you mean with "Normal Execution"? Does the GraphPrinter executes in "normal case"? If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of "GraphPrinter" mode. > >I am not yet fully sure how the reportInvalid* functions should look >like. However, the part of your patch that is easy to commit without >further discussion is to move the 'return false' out of the INVALID >macro. Meaning, translating the above code to: > > if (checkSomething()) { > INVALID(AffFunc, "Test" << SCEV <<); > return false; > } > >I propose to submit such a patch first and then focus on the remaining >problems. Yes, I agree with you. I have attached a patch file to move "return false" out of INVALID macro. > >Cheers, >Tobias Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-ScopDetect-move-return-false-out-of-INVALID-macro.patch Type: application/octet-stream Size: 9214 bytes Desc: not available URL: From echristo at gmail.com Sun Jul 21 20:00:23 2013 From: echristo at gmail.com (Eric Christopher) Date: Sun, 21 Jul 2013 20:00:23 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: On Sat, Jul 20, 2013 at 9:15 PM, Chris Lattner wrote: > Sorry, just getting caught up on an old thread. I haven't been involved in discussions of this. > > On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: >> First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. > > Of all of the proposals discussed, I like this the best: > > 1) This is a really simple extension of what we already have. > > 2) The backend providing a set of enumerations for the classes of diagnostics it produces doesn't tie it to clang, and doesn't make it language specific. Clients should be able to completely ignore the enum if they want the current (unclassified) behavior, and if an unknown enum value comes through, it is easy to handle. > > 3) I don't see how something like the stack size diagnostic can be implemented by clang calling into the backend. First, the MachineFunction (and thus, MachineFrameInfo) is a transient datastructure used by the backend when a function is compiled. There is nothing persistent for clang to query. Second, clang would have to know about all of the LLVM IR functions generated, which is possible, but impractical to track for things like thunks and other implicitly generated entrypoints. > > What is the specific concern with this approach? I don't see how this couples the backend to the frontend or causes layering violation problems. > I've not talked with Chandler about this, but to sketch out the way I'd do it (which is similar): Have the backend vend diagnostics, this can be done either with a set of enums and messages like you mentioned, or just have a message and location struct ala: struct Msg { const char *Message; Location Loc; }; that the consumer of the message can use via a handler. Alternately a handler (and we should have a default handler) can be passed in from the printer of the message (the frontend in the case provided) and it can be called on the error message. Absolutely this should be done via the LLVMContext to deal with the case of parallel function passes. class Handler { void printDiagnostic(const char *Message, Location Loc); }; (Note that I didn't say this was a fleshed out design ;) I think I prefer the latter to the former and we'd just need an "diagnostic callback handler" on the context. Though we would need to keep a set of diagnostics that the backend handles. That said, that it provides diagnostics based on its input language seems to make the most sense. It can use the location metadata if it has it to produce a location, otherwise you get the function, etc. in some sort of nicely degraded quality. I think this scheme could also work as a way of dealing with the "Optimization Diary" sort of use that Hal is envisioning as well. Keeping the separation of concerns around where the front end handles diagnostics on what we'll call "source locations" is pretty important, however, I agree that not every warning can be expressed this way, i.e. the stack size diagnostic. However, leaving the printing up to the front end is the best way to deal with this and a generic diagnostic engine would probably help for things like llc/opt where the backend just deals with its input language - IR. The existing inline asm diagnostics are ... problematic and it would definitely be nice to get a generic interface for them. Though they're actually separated into two separate cases where, I think, we end up with our confusion: a) Front end diagnostics - This is an area that needs some work to be decent, but it involves the front end querying the back end for things like register size, valid immediates, etc and should be implemented by the front end with an expanded set of target queries. We could use this as a way to solidify the backend api for the MS inline asm support as well and use some of that when parsing GNU style inline asm. b) Back end diagnostics - This is the stuff that the front end has no hope of diagnosing. i.e. "ran out of registers", or "can't figure out how to split this up into this kind of vector register". The latter has always been a bit funny and I'm always unhappy with it, but I don't have any better ideas. A unified scheme of communicating "help help I'm being oppressed by the front end" in the backend would be, at the very least, a step forward. Thoughts? -eric From nicholas at mxc.ca Sun Jul 21 21:07:56 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Sun, 21 Jul 2013 21:07:56 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> Message-ID: <51ECB01C.5020605@mxc.ca> Andrew Trick wrote: > Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... > > int32_t foo(int32_t* ptr) { > int i = 0; > int result; > do { > bar(ptr); > result = *ptr; > bar(ptr); > } while (i++< *ptr); > return result; > } > > Say we have a front end that declares bar as... > > declare void @bar(i32*) readonly; > > So 'bar' is 'readonly' and 'may-unwind'. > > When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... > > There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. Yes, we went through a phase where people would try to use "nounwind+readonly == no side-effects" to optimize. All such optimizations are wrong. Unless otherwise proven, a function may inf-loop, terminate the program, or longjmp. I tried to add 'halting' to help solve part of this a long time ago, but it never went in. The problem is that determining whether you have loops requires a FunctionPass (LoopInfo to find loops and SCEV to determine an upper bound) and applying function attributes is an SCC operation (indeed, an SCC is itself a loop), so it's all blocked behind fixing the PassManager to allow CGSGGPasses to depend on FunctionPasses. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html I'm now in a similar situation where I want 'nounwind' to mean "only exits by terminating the program or a return instruction" but unfortunately functions which longjmp are considered nounwind. I would like to change llvm to make longjmp'ing a form of unwinding (an exceptional exit to the function), but if I were to apply that rule today then we'd start putting dwarf eh tables on all our C code, oops. Nick From tobias at grosser.es Sun Jul 21 21:16:53 2013 From: tobias at grosser.es (Tobias Grosser) Date: Sun, 21 Jul 2013 21:16:53 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> Message-ID: <51ECB235.4060901@grosser.es> On 07/21/2013 07:33 PM, Star Tan wrote: > At 2013-07-22 01:40:31,"Tobias Grosser" wrote: > >> On 07/21/2013 09:49 AM, Star Tan wrote: >>> Hi all, >>> >>> >>> I have attached a patch file to reduce the polly-detect overhead. >> >> Great. >> >>> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so >>> TypeFinder is only called in DEBUG mode with the DEBUG macro. This >>> patch file did this work with following modifications: >>> >>> >>> First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: >>> INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); >>> would be converted into: >>> LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); >>> >>> >>> Second, it simplifies some complex operations like: >>> INVALID(AffFunc, >>> "Non affine branch in BB '" << BB.getName() << "' with LHS: " >>> << *LHS << " and RHS: " << *RHS); >>> into: >>> LastFailure = "Non affine branch in BB: " + BB.getName().str(); >> >> >>> In such cases, some information for "LHS" and "RHS" are missed. >> >> Yes. And unfortunately, we may also loose the 'name' of unnamed basic >> blocks. Basic blocks without a name get formatted as %111 with '111' >> being their sequence number. Your above change will not be able to >> derive this number. > Yes, but it is much cheaper by using BB.getName().str(). > Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder. Yes, that's the reason why it was used in the first place. >>> However, it only has little affects on the variable "LastFailure", >>> while keeping all information for DEBUG information. >> >> Why is that? It seems the DEBUG output is the very same that gets >> written to "LastFailure". > No, they are different. For example, the code like this: > INVALID(AffFunc, > "Non affine branch in BB '" << BB.getName() << "' with LHS: " > << *LHS << " and RHS: " << *RHS); > would be converted to: > LastFailure = "Non affine branch in BB: " + BB.getName().str(); > INVALID(AffFunc, > LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS); > You see, information about LHS and RHS would be kept in INVALID DEBUG information. > To keep the information about unnamed basic blocks, I think we can rewrite it as: > FailString = "Non affine branch in BB: "; > INVALID(AffFunc, > FailString << BB.getName() << "' with LHS: " > << *LHS << " and RHS: " << *RHS); > LastFailure = FailString + BB.getName().str(); >> >>> Since the >>> variable "LastFailure" is only used in ScopGraphPrinter, which should >>> only show critical information in graph, I hope such modification is >>> acceptable. >> >> Why should we only show critical information? In the GraphPrinter we do >> not worry about compile time so much, such that we can easily show >> helpful information. We just need to make sure that we do not slow down >> the compile-time in the generic case. > To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case? If that is true, I think we should control those operations for "LastFailure" as follows: > if (In GraphPrinter mode) { > LastFailure = ... > } > In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode. Yes. > What do you mean with "Normal Execution"? Does the GraphPrinter executes in "normal case"? > If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of "GraphPrinter" mode. It is not executed in normal mode. So yes, we should move all this under the condition of 'GraphPrintingMode'. >> I am not yet fully sure how the reportInvalid* functions should look >> like. However, the part of your patch that is easy to commit without >> further discussion is to move the 'return false' out of the INVALID >> macro. Meaning, translating the above code to: >> >> if (checkSomething()) { >> INVALID(AffFunc, "Test" << SCEV <<); >> return false; >> } >> >> I propose to submit such a patch first and then focus on the remaining >> problems. > Yes, I agree with you. I have attached a patch file to move "return false" out of INVALID macro. Great. Committed in r186805. I propose two more patches: 1) Transform the INVALID macro into function calls, that format the text and that set LastFailure. 2) Add checks at the beginning of those function calls and continue only if LogErrors is set The second one is slightly more involved as we would like to turn this option on automatically either in -debug mode or if -polly-show or -polly-show-only is set. What do you think? Does this sound right? Cheers, Tobias From michael.m.kuperstein at intel.com Sun Jul 21 22:19:12 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Mon, 22 Jul 2013 05:19:12 +0000 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECB01C.5020605@mxc.ca> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> Message-ID: <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> I'm not sure I understand why it's blocked on that, by the way. Even if we can't apply the attribute ourselves, I don't see why we wouldn't expose that ability to frontends. I'm not entirely sure "halting" is the right attribute either, by the way. What I, personally, would like to see is a way to specify a function call is safe to speculatively execute. That implies readnone (not just readonly), nounwind, halting - and Eris knows what else. Nick, is that too strong for you? Michael -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Lewycky Sent: Monday, July 22, 2013 07:08 To: Andrew Trick Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Does nounwind have semantics? Andrew Trick wrote: > Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... > > int32_t foo(int32_t* ptr) { > int i = 0; > int result; > do { > bar(ptr); > result = *ptr; > bar(ptr); > } while (i++< *ptr); > return result; > } > > Say we have a front end that declares bar as... > > declare void @bar(i32*) readonly; > > So 'bar' is 'readonly' and 'may-unwind'. > > When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... > > There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. Yes, we went through a phase where people would try to use "nounwind+readonly == no side-effects" to optimize. All such optimizations are wrong. Unless otherwise proven, a function may inf-loop, terminate the program, or longjmp. I tried to add 'halting' to help solve part of this a long time ago, but it never went in. The problem is that determining whether you have loops requires a FunctionPass (LoopInfo to find loops and SCEV to determine an upper bound) and applying function attributes is an SCC operation (indeed, an SCC is itself a loop), so it's all blocked behind fixing the PassManager to allow CGSGGPasses to depend on FunctionPasses. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html I'm now in a similar situation where I want 'nounwind' to mean "only exits by terminating the program or a return instruction" but unfortunately functions which longjmp are considered nounwind. I would like to change llvm to make longjmp'ing a form of unwinding (an exceptional exit to the function), but if I were to apply that rule today then we'd start putting dwarf eh tables on all our C code, oops. Nick _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From baldrick at free.fr Sun Jul 21 23:55:44 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 22 Jul 2013 08:55:44 +0200 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> Message-ID: <51ECD770.9090809@free.fr> Hi Andrew, On 22/07/13 02:56, Andrew Trick wrote: > Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... > > int32_t foo(int32_t* ptr) { > int i = 0; > int result; > do { > bar(ptr); > result = *ptr; > bar(ptr); > } while (i++ < *ptr); > return result; > } > > Say we have a front end that declares bar as... > > declare void @bar(i32*) readonly; > > So 'bar' is 'readonly' and 'may-unwind'. > > When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... is your worry here about the following case? - the load will trap if executed - bar throws an exception Thus with the original code the trap will not occur, because an exception will be thrown first, while if you move the first bar call below the load then the tap will occur. > > There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. I'm pretty sure that exiting the program is considered to write memory, so bar can't do that itself. Ciao, Duncan. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From baldrick at free.fr Mon Jul 22 00:13:31 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 22 Jul 2013 09:13:31 +0200 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> Message-ID: <51ECDB9B.6080909@free.fr> Hi Eli, On 22/07/13 03:37, Eli Friedman wrote: > On Sun, Jul 21, 2013 at 5:56 PM, Andrew Trick wrote: >> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... >> >> int32_t foo(int32_t* ptr) { >> int i = 0; >> int result; >> do { >> bar(ptr); >> result = *ptr; >> bar(ptr); >> } while (i++ < *ptr); >> return result; >> } >> >> Say we have a front end that declares bar as... >> >> declare void @bar(i32*) readonly; >> >> So 'bar' is 'readonly' and 'may-unwind'. > > It's impossible to write a readonly function which throws a DWARF > exception. Given that, I would imagine there are all sorts of bugs > nobody has ever run into. very true. That said, it is a pity not to support other ways of doing exception handling. For example, you could declare every function to return two results, the usual one and an additional i1 result. If the i1 value returned is 1 then this means that "an exception is being unwound", and the caller should jump to the landing pad if there is one; and if there isn't then it itself should return 1. This scheme doesn't write memory. OK, now imagine implementing this scheme where the additional return value is hidden, only implicitly present. You can argue that the function still doesn't write memory, though I admit you could also argue that the only way we have to model this extra parameter is to say that the function writes memory. What is a bit more problematic is that there is then also an implicit control flow construct ("if exception_value == 1 then return 1; end if") after every call. If everything was explicit then the above code bar(ptr); result = *ptr; bar(ptr); would be exc = bar(ptr); if (exc) return 1; result = *ptr; exc = bar(ptr); if (exc) return 1; At this point it is clear that because bar is readonly, the second bar call can be dropped, giving exc = bar(ptr); if (exc) return 1; result = *ptr; However it would have been wrong to drop the first bar call. This is all obvious when there are no hidden parameters and everything is explicit. I can't help feeling that either we should require everything to be explicit like this, or if it is implicit then probably we should say that bar does write memory (the invisible return parameter). But even then the implicit control flow after the bar call isn't modelled (already the case with the usual exception handling) which is an endless source of little bugs, though in practice as people don't raise exceptions much the bugs aren't noticed... Ciao, Duncan. From resistor at mac.com Mon Jul 22 00:15:48 2013 From: resistor at mac.com (Owen Anderson) Date: Mon, 22 Jul 2013 00:15:48 -0700 Subject: [LLVMdev] Inst field in MSP430InstrFormats.td In-Reply-To: References: Message-ID: <35C8E458-E1F7-4527-812A-B0508BAC5E82@mac.com> The Inst field is used to specify instruction encodings, which are then used to generate assemblers and disassemblers. I'm not sure offhand, but it's possible that the MSP430 backend doesn't make use of an auto-generated assembler. --Owen On Jul 21, 2013, at 4:19 PM, David Wiberg wrote: > Hello, > > Within the file "MSP430InstrFormats.td" there is a class called > "MSP430Inst" which has "Instruction" as superclass. Within this class > there is a field called "Inst" (field bits<16> Inst;) which gets > assigned in classes which specifies a specific instruction format, > e.g. IForm contains: > let Inst{12-15} = opcode; > let Inst{7} = ad.Value; > let Inst{6} = bw; > let Inst{4-5} = as.Value; > > From what I can see these values does not propagate to the files > generated by TableGen. I have tried removing the field and it is at > least still possible to run llc on a simple input file. This leads to > a couple of questions: > 1. Is this field redundant or does it have some use which I have missed? > 2. What is the status of the MSP430 backend? In a mail from 2012 > (http://lists.ransford.org/pipermail/llvm-msp430/2012-February/000194.html) > Anton indicates that it is usable but is it a good basis for backend > studies? > > My goal is to implement support for a new target and one step on the > way is to fully understand how one of the existing backends work. > > Thanks > David > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nicholas at mxc.ca Mon Jul 22 00:24:19 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Mon, 22 Jul 2013 00:24:19 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> Message-ID: <51ECDE23.2020004@mxc.ca> Kuperstein, Michael M wrote: > I'm not sure I understand why it's blocked on that, by the way. It blocks our ability to automatically deduce the halting attribute in the optimizer, which was necessary for the use case I had at the time. If you have a use case of your own, feel free to propose the patch! (Technically it's not *blocked* -- see how my patch does it! -- but the workarounds are too horrible to be committed.) > Even if we can't apply the attribute ourselves, I don't see why we wouldn't expose that ability to frontends. Frontends are free to put attributes on functions if they want to. Go for it! > I'm not entirely sure "halting" is the right attribute either, by the way. > What I, personally, would like to see is a way to specify a function call is safe to speculatively execute. That implies readnone (not just readonly), nounwind, halting - and Eris knows what else. Nick, is that too strong for you? I strongly prefer the approach of having orthogonal attributes. There are optimizations that you can do with each of these attributes on their own. In particular I think that readonly+halting+nounwind+nolongjmp is going to be common and I'd feel silly if we had a special case for readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more common case. That said, I'm also going to feel silly if we don't end up with enough attributes to allow isSafeToSpeculate to deduce it, which is where we are right now. I was planning to get back to fixing this after Chandler's promised PassManager work. Nick > > Michael > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Lewycky > Sent: Monday, July 22, 2013 07:08 > To: Andrew Trick > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Andrew Trick wrote: >> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... >> >> int32_t foo(int32_t* ptr) { >> int i = 0; >> int result; >> do { >> bar(ptr); >> result = *ptr; >> bar(ptr); >> } while (i++< *ptr); >> return result; >> } >> >> Say we have a front end that declares bar as... >> >> declare void @bar(i32*) readonly; >> >> So 'bar' is 'readonly' and 'may-unwind'. >> >> When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... >> >> There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. > > Yes, we went through a phase where people would try to use "nounwind+readonly == no side-effects" to optimize. All such optimizations are wrong. Unless otherwise proven, a function may inf-loop, terminate the program, or longjmp. > > I tried to add 'halting' to help solve part of this a long time ago, but it never went in. The problem is that determining whether you have loops requires a FunctionPass (LoopInfo to find loops and SCEV to determine an upper bound) and applying function attributes is an SCC operation (indeed, an SCC is itself a loop), so it's all blocked behind fixing the PassManager to allow CGSGGPasses to depend on FunctionPasses. > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > I'm now in a similar situation where I want 'nounwind' to mean "only exits by terminating the program or a return instruction" but unfortunately functions which longjmp are considered nounwind. I would like to change llvm to make longjmp'ing a form of unwinding (an exceptional exit to the function), but if I were to apply that rule today then we'd start putting dwarf eh tables on all our C code, oops. > > Nick > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > From atrick at apple.com Mon Jul 22 00:32:48 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 22 Jul 2013 00:32:48 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECD770.9090809@free.fr> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECD770.9090809@free.fr> Message-ID: On Jul 21, 2013, at 11:55 PM, Duncan Sands wrote: > Hi Andrew, > > On 22/07/13 02:56, Andrew Trick wrote: >> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... >> >> int32_t foo(int32_t* ptr) { >> int i = 0; >> int result; >> do { >> bar(ptr); >> result = *ptr; >> bar(ptr); >> } while (i++ < *ptr); >> return result; >> } >> >> Say we have a front end that declares bar as... >> >> declare void @bar(i32*) readonly; >> >> So 'bar' is 'readonly' and 'may-unwind'. >> >> When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... > > is your worry here about the following case? > - the load will trap if executed > - bar throws an exception > Thus with the original code the trap will not occur, because an exception will > be thrown first, while if you move the first bar call below the load then the > tap will occur. Essentially, yes. My takeaway from looking into it is: - nounwind means no dwarf EH. Absence of nounwind means absence of dwarf EH. It would be unwise for optimization passes to reason about the semantics beyond that. I was momentarily mislead by the LICM code that handles MayThrow specially. - Things that throw exceptions or trap in defined ways are not readonly. - Runtime checks for overflow, div-by-zero, bounds checks, etc. should be implemented at the IR level as branches to noreturn calls because it can be done that way and I haven’t seen concrete evidence that it’s too expensive. Don’t try to do something fancy with intrinsics and attributes unless absolutely required. - Optimizing readonly calls in C is a tangentially related issue, as Nick explained. My answer to that problem is that C compilers are effectively forced to assume that calls terminate, so developers should not expect otherwise. If C developers don’t want the compiler to optimize their infinite loop or infinite recursion, they need to throw in a volatile dereference. -Andy >> >> There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. > > I'm pretty sure that exiting the program is considered to write memory, so bar > can't do that itself. > > Ciao, Duncan. > > This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. >> >> -Andy >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Mon Jul 22 00:56:14 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 22 Jul 2013 09:56:14 +0200 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECD770.9090809@free.fr> Message-ID: <51ECE59E.8040704@free.fr> Hi Andrew, On 22/07/13 09:32, Andrew Trick wrote: > > On Jul 21, 2013, at 11:55 PM, Duncan Sands > wrote: > >> Hi Andrew, >> >> On 22/07/13 02:56, Andrew Trick wrote: >>> Does 'nounwind' have semantics that inform optimization passes? It seems to >>> in some cases, but not consistently. For example... >>> >>> int32_t foo(int32_t* ptr) { >>> int i = 0; >>> int result; >>> do { >>> bar(ptr); >>> result = *ptr; >>> bar(ptr); >>> } while (i++ < *ptr); >>> return result; >>> } >>> >>> Say we have a front end that declares bar as... >>> >>> declare void @bar(i32*) readonly; >>> >>> So 'bar' is 'readonly' and 'may-unwind'. >>> >>> When LICM tries to hoist the load it interprets the 'may-unwind' as >>> "MayThrow" in LICM-language and bails. However, when it tries to sink the >>> call itself it sees the 'readonly', assumes no side effects and sinks it >>> below the loads. Hmm... >> >> is your worry here about the following case? >> - the load will trap if executed >> - bar throws an exception >> Thus with the original code the trap will not occur, because an exception will >> be thrown first, while if you move the first bar call below the load then the >> tap will occur. > > Essentially, yes. My takeaway from looking into it is: my understanding is different. I'm pretty sure that what I'm about to say is the traditional way these things have been viewed in LLVM. That doesn't mean that it's the best way to view these things. > - nounwind means no dwarf EH. Absence I guess you mean presence. of nounwind means absence of dwarf EH. It > would be unwise for optimization passes to reason about the semantics beyond > that. I was momentarily mislead by the LICM code that handles MayThrow specially. nounwind has nothing to do with dwarf, since exceptions themselves need have nothing to do with dwarf (which is just one way of implementing exception handling). Don't forget setjmp/longjmp exception handling, and also exception handling by returning an extra invisible parameter (which I mentioned in another email) which IIRC was actually implemented by someone at the codegen level at some point as it was faster than dwarf for code that throws exceptions a lot. An additional point is that while in C++ you create an exception object, not all languages associate an object with an exception, some just want to do the equivalent of a non-local goto. Creating an exception object means allocating memory, mucking around with global data structures and obviously writing memory. A non-local goto doesn't have to do more than unwind the stack until it gets to the right frame then do a jump. It's not clear to me that that should be considered as writing memory. Here's my take: a call to an function marked nounwind either never returns (eg infinite loop or exits the program) or returns normally. It doesn't "return" by unwinding the stack out of the caller. On the other hand a function that is not marked nounwind may "return" by unwinding the stack; control in this case doesn't continue in the caller, it continues at least one further up the stack. Thus in this case the instructions after the call instruction are not executed. Note I'm talking about an ordinary call here, not an invoke. In the case of an invoke control may continue in the caller function, but only at a well-defined point (the landing pad). > - Things that throw exceptions or trap in defined ways are not readonly. See above for why throwing an exception doesn't have to write memory. Dwarf exception handling, and anything which can be used to implement C++ exception handling, is clearly writing memory and thus cannot be used inside a readonly function. So yes, any function Clang produces that throws an exception is not going to be readonly. But as I mentioned above some languages have no exception object and just unwind the stack. For these the expression "throwing an exception" (which implicitly includes the idea that there is an exception object) is not really appropriate; "unwinds the stack" is the basic concept here. This is basically orthogonal to readonly. > - Runtime checks for overflow, div-by-zero, bounds checks, etc. should be > implemented at the IR level as branches to noreturn calls because it can be done > that way and I haven’t seen concrete evidence that it’s too expensive. Don’t try > to do something fancy with intrinsics and attributes unless absolutely required. I agree with this. Ciao, Duncan. > - Optimizing readonly calls in C is a tangentially related issue, as Nick > explained. My answer to that problem is that C compilers are effectively forced > to assume that calls terminate, so developers should not expect otherwise. If C > developers don’t want the compiler to optimize their infinite loop or infinite > recursion, they need to throw in a volatile dereference. > > -Andy > >>> >>> There doesn't appear to be a way to declare a function that is guaranteed not >>> to write to memory in a way that affects the caller, but may have another >>> well-defined side effect like aborting the program. >> >> I'm pretty sure that exiting the program is considered to write memory, so bar >> can't do that itself. >> >> Ciao, Duncan. >> >> This is interesting, because that is the way runtime checks for safe languages >> would like to be defined. I'm perfectly happy telling front ends to generate >> control flow for well-defined traps, since I like lots of basic blocks in my >> IR. But I'm still curious how others deal with this. >>> >>> -Andy >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From michael.m.kuperstein at intel.com Mon Jul 22 01:11:01 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Mon, 22 Jul 2013 08:11:01 +0000 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECDE23.2020004@mxc.ca> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> Message-ID: <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> Of course frontends are free to put attributes, but it would be nice if optimizations actually used them. ;-) My use case is that of proprietary frontend that happens to know some library function calls - which are only resolved at link time - have no side effects and are safe to execute speculatively, and wants to tell the optimizer it can move them around however it likes. I'll gladly submit a patch that uses these hints, but I'd like to reach some consensus on what the desired attributes actually are first. The last thing I want is to add attributes that are only useful to myself. Regarding having several orthogonal attributes vs. things like "safetospeculate": To know a function is safe to speculatively execute, I need at least: 1) readnone (readonly is insufficient, unless I know all accessed pointers are valid) 2) nounwind 3) nolongjmp (I guess?) 4) no undefined behavior. This includes things like "halting" and "no division by zero", but that's not, by far, an exhaustive list. I guess there are several ways to handle (4). Ideally, I agree with you, we'd like a set of orthogonal attributes that, taken together, imply that the function's behavior is not undefined. But that requires mapping all sources of undefined behavior (I don't think this is currently documented for LLVM IR, at least not in a centralized fashion) and adding a very specific attribute for each of them. I'm not sure having function declarations with "readnone, nounwind, nolongjmp, halting, nodivbyzero, nopoisonval, nocomparelabels, nounreachable, ..." is desirable. We could also have a "welldefined" attribute and a "halting" attribute where "welldefined" subsumes "halting", if the specific case of a function which halts but may have undefined behavior is important. While the two are not orthogonal, it's similar to the situation with "readnone" and "readonly". Does that sound reasonable? Michael -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Monday, July 22, 2013 10:24 To: Kuperstein, Michael M Cc: Andrew Trick; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Does nounwind have semantics? Kuperstein, Michael M wrote: > I'm not sure I understand why it's blocked on that, by the way. It blocks our ability to automatically deduce the halting attribute in the optimizer, which was necessary for the use case I had at the time. If you have a use case of your own, feel free to propose the patch! (Technically it's not *blocked* -- see how my patch does it! -- but the workarounds are too horrible to be committed.) > Even if we can't apply the attribute ourselves, I don't see why we wouldn't expose that ability to frontends. Frontends are free to put attributes on functions if they want to. Go for it! > I'm not entirely sure "halting" is the right attribute either, by the way. > What I, personally, would like to see is a way to specify a function call is safe to speculatively execute. That implies readnone (not just readonly), nounwind, halting - and Eris knows what else. Nick, is that too strong for you? I strongly prefer the approach of having orthogonal attributes. There are optimizations that you can do with each of these attributes on their own. In particular I think that readonly+halting+nounwind+nolongjmp is going to be common and I'd feel silly if we had a special case for readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more common case. That said, I'm also going to feel silly if we don't end up with enough attributes to allow isSafeToSpeculate to deduce it, which is where we are right now. I was planning to get back to fixing this after Chandler's promised PassManager work. Nick > > Michael > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Lewycky > Sent: Monday, July 22, 2013 07:08 > To: Andrew Trick > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Andrew Trick wrote: >> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... >> >> int32_t foo(int32_t* ptr) { >> int i = 0; >> int result; >> do { >> bar(ptr); >> result = *ptr; >> bar(ptr); >> } while (i++< *ptr); >> return result; >> } >> >> Say we have a front end that declares bar as... >> >> declare void @bar(i32*) readonly; >> >> So 'bar' is 'readonly' and 'may-unwind'. >> >> When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... >> >> There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. > > Yes, we went through a phase where people would try to use "nounwind+readonly == no side-effects" to optimize. All such optimizations are wrong. Unless otherwise proven, a function may inf-loop, terminate the program, or longjmp. > > I tried to add 'halting' to help solve part of this a long time ago, but it never went in. The problem is that determining whether you have loops requires a FunctionPass (LoopInfo to find loops and SCEV to determine an upper bound) and applying function attributes is an SCC operation (indeed, an SCC is itself a loop), so it's all blocked behind fixing the PassManager to allow CGSGGPasses to depend on FunctionPasses. > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > I'm now in a similar situation where I want 'nounwind' to mean "only exits by terminating the program or a return instruction" but unfortunately functions which longjmp are considered nounwind. I would like to change llvm to make longjmp'ing a form of unwinding (an exceptional exit to the function), but if I were to apply that rule today then we'd start putting dwarf eh tables on all our C code, oops. > > Nick > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From atrick at apple.com Mon Jul 22 01:23:08 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 22 Jul 2013 01:23:08 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECE59E.8040704@free.fr> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECD770.9090809@free.fr> <51ECE59E.8040704@free.fr> Message-ID: On Jul 22, 2013, at 12:56 AM, Duncan Sands wrote: > my understanding is different. I'm pretty sure that what I'm about to say is > the traditional way these things have been viewed in LLVM. That doesn't mean > that it's the best way to view these things. > >> - nounwind means no dwarf EH. Absence > > I guess you mean presence. > > of nounwind means absence of dwarf EH. It >> would be unwise for optimization passes to reason about the semantics beyond >> that. I was momentarily mislead by the LICM code that handles MayThrow specially. > > nounwind has nothing to do with dwarf, since exceptions themselves need have > nothing to do with dwarf (which is just one way of implementing exception > handling). Don't forget setjmp/longjmp exception handling, and also exception > handling by returning an extra invisible parameter (which I mentioned in > another email) which IIRC was actually implemented by someone at the codegen > level at some point as it was faster than dwarf for code that throws exceptions > a lot. An additional point is that while in C++ you create an exception object, > not all languages associate an object with an exception, some just want to do > the equivalent of a non-local goto. Creating an exception object means > allocating memory, mucking around with global data structures and obviously > writing memory. A non-local goto doesn't have to do more than unwind the stack > until it gets to the right frame then do a jump. It's not clear to me that that > should be considered as writing memory. > > Here's my take: a call to an function marked nounwind either never returns > (eg infinite loop or exits the program) or returns normally. It doesn't > "return" by unwinding the stack out of the caller. On the other hand a > function that is not marked nounwind may "return" by unwinding the stack; > control in this case doesn't continue in the caller, it continues at least one > further up the stack. Thus in this case the instructions after the call > instruction are not executed. Note I'm talking about an ordinary call here, > not an invoke. In the case of an invoke control may continue in the caller > function, but only at a well-defined point (the landing pad). Good explanation. Your definition of nounwind is completely logical. I would prefer not to rely on it though because - Realistically, the semantics won’t be well tested. - It doesn’t seem terribly important to treat nonlocal gotos as readonly (though maybe it is to you :) - When it is important to optimize memory access around nonlocal gotos, I prefer to expose control flow to the optimizer explicitly. e.g. why not just use invokes for all your may-throw calls, then you’re free to mark them readonly? -Andy >> - Things that throw exceptions or trap in defined ways are not readonly. > > See above for why throwing an exception doesn't have to write memory. Dwarf > exception handling, and anything which can be used to implement C++ exception > handling, is clearly writing memory and thus cannot be used inside a readonly > function. So yes, any function Clang produces that throws an exception is not > going to be readonly. But as I mentioned above some languages have no > exception object and just unwind the stack. For these the expression "throwing > an exception" (which implicitly includes the idea that there is an exception > object) is not really appropriate; "unwinds the stack" is the basic concept > here. This is basically orthogonal to readonly. > >> - Runtime checks for overflow, div-by-zero, bounds checks, etc. should be >> implemented at the IR level as branches to noreturn calls because it can be done >> that way and I haven’t seen concrete evidence that it’s too expensive. Don’t try >> to do something fancy with intrinsics and attributes unless absolutely required. > > I agree with this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Mon Jul 22 01:37:36 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 22 Jul 2013 10:37:36 +0200 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECD770.9090809@free.fr> <51ECE59E.8040704@free.fr> Message-ID: <51ECEF50.3000509@free.fr> Hi Andrew, On 22/07/13 10:23, Andrew Trick wrote: > > On Jul 22, 2013, at 12:56 AM, Duncan Sands > wrote: > >> my understanding is different. I'm pretty sure that what I'm about to say is >> the traditional way these things have been viewed in LLVM. That doesn't mean >> that it's the best way to view these things. >> >>> - nounwind means no dwarf EH. Absence >> >> I guess you mean presence. >> >> of nounwind means absence of dwarf EH. It >>> would be unwise for optimization passes to reason about the semantics beyond >>> that. I was momentarily mislead by the LICM code that handles MayThrow specially. >> >> nounwind has nothing to do with dwarf, since exceptions themselves need have >> nothing to do with dwarf (which is just one way of implementing exception >> handling). Don't forget setjmp/longjmp exception handling, and also exception >> handling by returning an extra invisible parameter (which I mentioned in >> another email) which IIRC was actually implemented by someone at the codegen >> level at some point as it was faster than dwarf for code that throws exceptions >> a lot. An additional point is that while in C++ you create an exception object, >> not all languages associate an object with an exception, some just want to do >> the equivalent of a non-local goto. Creating an exception object means >> allocating memory, mucking around with global data structures and obviously >> writing memory. A non-local goto doesn't have to do more than unwind the stack >> until it gets to the right frame then do a jump. It's not clear to me that that >> should be considered as writing memory. >> >> Here's my take: a call to an function marked nounwind either never returns >> (eg infinite loop or exits the program) or returns normally. It doesn't >> "return" by unwinding the stack out of the caller. On the other hand a >> function that is not marked nounwind may "return" by unwinding the stack; >> control in this case doesn't continue in the caller, it continues at least one >> further up the stack. Thus in this case the instructions after the call >> instruction are not executed. Note I'm talking about an ordinary call here, >> not an invoke. In the case of an invoke control may continue in the caller >> function, but only at a well-defined point (the landing pad). > > Good explanation. Your definition of nounwind is completely logical. I would > prefer not to rely on it though because > - Realistically, the semantics won’t be well tested. this is true. Those who need this will have to do careful testing and bug fixing themselves. But to my mind this is not a reason to abandon these semantics. A good reason to abandon these semantics is if they created trouble for a large number of users, e.g. clang, for example by making it hard to do important optimizations. Do they? Are these semantics actively harmful? If not I'd prefer to keep them since they are fairly clean. > - It doesn’t seem terribly important to treat nonlocal gotos as readonly (though > maybe it is to you :) It's not a problem for me, I just like to keep orthogonal things orthogonal. Maybe that's a mistake and we should rename readonly to "nothing funky happens". > - When it is important to optimize memory access around nonlocal gotos, I prefer > to expose control flow to the optimizer explicitly. > e.g. why not just use invokes for all your may-throw calls, then you’re free to > mark them readonly? Let's say that we are in function foo. We call bar. We were called by qaz. I'm not talking about non-local gotos that jump from bar to somewhere in us (foo). These indeed need to be modelled with invoke. I'm talking about those that jump from bar to somewhere in qaz. That means that the place in qaz that called us (foo) will need to use an invoke. But from the point of view of foo, the call to bar never returns to foo, it just instantly leaps across foo all the way up to qaz and execution continues there. So foo doesn't need to do anything much, all the heavy lifting is in bar and in qaz. All foo has to do is keep in mind that control may never get to the next instruction after the call to bar. Ciao, Duncan. > > -Andy > >>> - Things that throw exceptions or trap in defined ways are not readonly. >> >> See above for why throwing an exception doesn't have to write memory. Dwarf >> exception handling, and anything which can be used to implement C++ exception >> handling, is clearly writing memory and thus cannot be used inside a readonly >> function. So yes, any function Clang produces that throws an exception is not >> going to be readonly. But as I mentioned above some languages have no >> exception object and just unwind the stack. For these the expression "throwing >> an exception" (which implicitly includes the idea that there is an exception >> object) is not really appropriate; "unwinds the stack" is the basic concept >> here. This is basically orthogonal to readonly. >> >>> - Runtime checks for overflow, div-by-zero, bounds checks, etc. should be >>> implemented at the IR level as branches to noreturn calls because it can be done >>> that way and I haven’t seen concrete evidence that it’s too expensive. Don’t try >>> to do something fancy with intrinsics and attributes unless absolutely required. >> >> I agree with this. > From tanmx_star at yeah.net Mon Jul 22 01:46:37 2013 From: tanmx_star at yeah.net (Star Tan) Date: Mon, 22 Jul 2013 16:46:37 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51ECB235.4060901@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> Message-ID: <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> At 2013-07-22 12:16:53,"Tobias Grosser" wrote: >On 07/21/2013 07:33 PM, Star Tan wrote: >> At 2013-07-22 01:40:31,"Tobias Grosser" wrote: >> >>> On 07/21/2013 09:49 AM, Star Tan wrote: >>>> Hi all, >>>> >>>> >>>> I have attached a patch file to reduce the polly-detect overhead. >>> >>> Great. >>> >>>> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so >>>> TypeFinder is only called in DEBUG mode with the DEBUG macro. This >>>> patch file did this work with following modifications: >>>> >>>> >>>> First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: >>>> INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); >>>> would be converted into: >>>> LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); >>>> >>>> >>>> Second, it simplifies some complex operations like: >>>> INVALID(AffFunc, >>>> "Non affine branch in BB '" << BB.getName() << "' with LHS: " >>>> << *LHS << " and RHS: " << *RHS); >>>> into: >>>> LastFailure = "Non affine branch in BB: " + BB.getName().str(); >>> >>> >>>> In such cases, some information for "LHS" and "RHS" are missed. >>> >>> Yes. And unfortunately, we may also loose the 'name' of unnamed basic >>> blocks. Basic blocks without a name get formatted as %111 with '111' >>> being their sequence number. Your above change will not be able to >>> derive this number. >> Yes, but it is much cheaper by using BB.getName().str(). >> Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder. > >Yes, that's the reason why it was used in the first place. > >>>> However, it only has little affects on the variable "LastFailure", >>>> while keeping all information for DEBUG information. >>> >>> Why is that? It seems the DEBUG output is the very same that gets >>> written to "LastFailure". >> No, they are different. For example, the code like this: >> INVALID(AffFunc, >> "Non affine branch in BB '" << BB.getName() << "' with LHS: " >> << *LHS << " and RHS: " << *RHS); >> would be converted to: >> LastFailure = "Non affine branch in BB: " + BB.getName().str(); >> INVALID(AffFunc, >> LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS); >> You see, information about LHS and RHS would be kept in INVALID DEBUG information. >> To keep the information about unnamed basic blocks, I think we can rewrite it as: >> FailString = "Non affine branch in BB: "; >> INVALID(AffFunc, >> FailString << BB.getName() << "' with LHS: " >> << *LHS << " and RHS: " << *RHS); >> LastFailure = FailString + BB.getName().str(); >>> >>>> Since the >>>> variable "LastFailure" is only used in ScopGraphPrinter, which should >>>> only show critical information in graph, I hope such modification is >>>> acceptable. >>> >>> Why should we only show critical information? In the GraphPrinter we do >>> not worry about compile time so much, such that we can easily show >>> helpful information. We just need to make sure that we do not slow down >>> the compile-time in the generic case. >> To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case? If that is true, I think we should control those operations for "LastFailure" as follows: >> if (In GraphPrinter mode) { >> LastFailure = ... >> } >> In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode. > >Yes. > > >> What do you mean with "Normal Execution"? Does the GraphPrinter executes in "normal case"? >> If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of "GraphPrinter" mode. > >It is not executed in normal mode. So yes, we should move all this under >the condition of 'GraphPrintingMode'. > >>> I am not yet fully sure how the reportInvalid* functions should look >>> like. However, the part of your patch that is easy to commit without >>> further discussion is to move the 'return false' out of the INVALID >>> macro. Meaning, translating the above code to: >>> >>> if (checkSomething()) { >>> INVALID(AffFunc, "Test" << SCEV <<); >>> return false; >>> } >>> >>> I propose to submit such a patch first and then focus on the remaining >>> problems. >> Yes, I agree with you. I have attached a patch file to move "return false" out of INVALID macro. > >Great. Committed in r186805. > >I propose two more patches: > > 1) Transform the INVALID macro into function calls, that format > the text and that set LastFailure. Translating the INVALID macro into function calls would complicate the operations for different counters. For example, currently users can add a new counter by simply adding the following line: BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form"); But if we translate INVALID macro into function calls, then users has to add a function for this counter: void INVLIAD_SimpleLoop (...). This is because we cannot use the following macro combination in function calls: if (!Context.Verifying) \ ++Bad##NAME##ForScop; So, I do not think it is necessary to translate the INVALID macro into function calls. Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ... ? In that case, I could provide a small patch file for this purpose -:) > 2) Add checks at the beginning of those function calls and > continue only if LogErrors is set Those invalid log strings are used for two separate cases: 1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement: if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE)) That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it. 2) The other case is for the variable "LastFailure", which is used only in GraphPrinter. Currently string operations for "LastFailure" always execute in normal cases. My idea is to put such string operations under the condition of "GraphPrinter" mode. For example, I would like to translate the "INVALID" macro into: #define INVALID(NAME, MESSAGE) \ do { \ if (GraphViewMode) { \ std::string Buf; \ raw_string_ostream fmt(Buf); \ fmt << MESSAGE; \ fmt.flush(); \ LastFailure = Buf; \ } \ DEBUG(dbgs() << MESSAGE); \ DEBUG(dbgs() << "\n"); \ assert(!Context.Verifying &&#NAME); \ if (!Context.Verifying) \ ++Bad##NAME##ForScop; \ } while (0) As you have suggested, we can construct the condition GraphViewMode with "-polly-show", "-polly-show-only", "polly-dot" and "polly-dot-only". However, I see all these options are defined as "static" variables in lib/RegisterPasses.cpp. Do you think I should translate these local variables into global variables or should I define another option like "-polly-dot-scop" in ScopDetection.cpp? Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Mon Jul 22 02:17:27 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 22 Jul 2013 02:17:27 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECEF50.3000509@free.fr> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECD770.9090809@free.fr> <51ECE59E.8040704@free.fr> <51ECEF50.3000509@free.fr> Message-ID: <6A8A317C-08D9-4A3E-B710-5D7B3B1E38A4@apple.com> On Jul 22, 2013, at 1:37 AM, Duncan Sands wrote: >> Good explanation. Your definition of nounwind is completely logical. I would >> prefer not to rely on it though because >> - Realistically, the semantics won’t be well tested. > > this is true. Those who need this will have to do careful testing and bug > fixing themselves. But to my mind this is not a reason to abandon these > semantics. A good reason to abandon these semantics is if they created trouble > for a large number of users, e.g. clang, for example by making it hard to do > important optimizations. Do they? Are these semantics actively harmful? > If not I'd prefer to keep them since they are fairly clean. I was justifying my decision not to pursue fixing bugs related to this because of the futility. But you’re right, they can probably be fixed without anyone complaining about performance. >> - It doesn’t seem terribly important to treat nonlocal gotos as readonly (though >> maybe it is to you :) > > It's not a problem for me, I just like to keep orthogonal things orthogonal. > Maybe that's a mistake and we should rename readonly to "nothing funky happens". > >> - When it is important to optimize memory access around nonlocal gotos, I prefer >> to expose control flow to the optimizer explicitly. >> e.g. why not just use invokes for all your may-throw calls, then you’re free to >> mark them readonly? > > Let's say that we are in function foo. We call bar. We were called by qaz. > I'm not talking about non-local gotos that jump from bar to somewhere in us > (foo). These indeed need to be modelled with invoke. I'm talking about those > that jump from bar to somewhere in qaz. That means that the place in qaz that > called us (foo) will need to use an invoke. But from the point of view of foo, > the call to bar never returns to foo, it just instantly leaps across foo all the > way up to qaz and execution continues there. So foo doesn't need to do anything > much, all the heavy lifting is in bar and in qaz. All foo has to do is keep in > mind that control may never get to the next instruction after the call to bar. I’m saying that in some hypothetical language where we need to optimize across readonly exception throwing calls, that both calls to foo and qaz could be invokes (assuming the optimizer doesn’t downgrade the invoke-foo). The concept of a may-throw call with implicit control flow is a shortcut for implementing C++ exceptions that makes the IR a tad prettier. The shortcut happens to work most of the time because the calls always write memory. My only point here is that runtimes can still do exceptions however they like and the front end doesn't need your semantics to do it. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyh.kernel at gmail.com Mon Jul 22 02:21:56 2013 From: lyh.kernel at gmail.com (lyh.kernel) Date: Mon, 22 Jul 2013 17:21:56 +0800 Subject: [LLVMdev] Advice to API inconsistency between different versions Message-ID: Hello all, LLVM's API varies a lot from version to version. Take a example, header llvm/Target/TargetData.h changed to llvm/DataLayout.h from LLVM version 3.1 to version 3.2. This sliced the program just like: #if defined(LLVM_V31) #include llvm/Target/TargetData.h #elif defined(LLVM_V32) #include llvm/DataLayout.h #else #error NEED HEADER The code is in a mess if I want to support previous LLVM version. I am wondering how do you support different LLVM versions and keep the code clean as well? On the other hand, consider the example above. Do you usually check for LLVM version (ex. LLVM_V31, LLVM_V32) or check for feature instead, which use m4 AC_CHECK_HEADER to detect whether the header exist during configuration? Thanks a lot -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton at korobeynikov.info Mon Jul 22 02:31:04 2013 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Mon, 22 Jul 2013 13:31:04 +0400 Subject: [LLVMdev] Inst field in MSP430InstrFormats.td In-Reply-To: <35C8E458-E1F7-4527-812A-B0508BAC5E82@mac.com> References: <35C8E458-E1F7-4527-812A-B0508BAC5E82@mac.com> Message-ID: > The Inst field is used to specify instruction encodings, which are then used to generate assemblers and disassemblers. I'm not sure offhand, but it's possible that the MSP430 backend doesn't make use of an auto-generated assembler. Yes. On MSP430 there is only codegeneration, but still no assembler parsing and disassembling. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From leftcopy.chx at gmail.com Mon Jul 22 03:00:25 2013 From: leftcopy.chx at gmail.com (Hongxu Chen) Date: Mon, 22 Jul 2013 18:00:25 +0800 Subject: [LLVMdev] How to additionally compile the source files in subdirectories when using Makefile? Message-ID: <874nbmnccm.fsf@gmail.com> Hi guys, I am writing a LLVM pass plugin, and just simply add a directory mypass into llvm/lib/Transforms. But since I don't hope this directory to contain too many source files, I add some subdirectories. For instance, there are 2 files inside *mypass/sub: basic.h, basic.cpp. Here is the structure of the directory llvm/lib/Transforms/mypass $tree mypass mypass ├── Makefile ├── mypass.c └── sub ├── basic.cpp ├── basic.h └── Makefile The problem is that basic.cpp will never be compiled into the loadable module which mypass.c symbols lies in. And LLVMLIBS USEDLIBS doesn't work either. Here is mypass/Makefile LEVEL = ../../.. DIRS = sub LIBRARYNAME = mypass LOADABLE_MODULE = 1 include $(LEVEL)/Makefile.common And mypass/sub/Makefile LEVEL = ../../../.. # LIBRARYNAME = test_sub # ARCHIVE_LIBRARY = 1 # LOADABLE_MODULE = 1 include $(LEVEL)/Makefile.common When mypss/sub/Makefile uses LOADABLE_MODULE = 1 with another library name(test_sub), and by additionally loading it when using opt, it does work but I think there is a better way. Also AFAIK, using cmake is much easier for this case since it only needs to use this configuration in mypass/Makefile: add_llvm_loadable_module( mypass mypass.cpp sub/basic.cpp ) But for some reasons I have to use GNU make to do this. Is there a good way? -- Regards, Hongxu Chen From dimitry at andric.com Mon Jul 22 04:02:20 2013 From: dimitry at andric.com (Dimitry Andric) Date: Mon, 22 Jul 2013 13:02:20 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> Message-ID: On Jul 22, 2013, at 02:11, Reid Kleckner wrote: > On Sun, Jul 21, 2013 at 5:51 PM, Óscar Fuentes wrote: > Giorgio Franceschetti writes: ... > I think that you installed the wrong version of Python. IIRC llvm-build > requires Python 2.X > > There was a patch on the commit list to try to make some of our scripts work for both 2 and 3. I should dig it up and review it. See http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130715/181547.html > My initial impression was that still probably nobody uses python 3, so it's not worth adding support that will break. But if users actually have python 3, maybe it's worth it. I guess it is slow in coming, but when it gains a certain critical mass, the 2.x series will be doomed quickly. In any case, the patch looks fairly minimal, except for globally replacing "print foo" with "print(foo)", which I wish the Python guys hadn't done. :-) -Dimitry From dwiberg at gmail.com Mon Jul 22 04:35:23 2013 From: dwiberg at gmail.com (David Wiberg) Date: Mon, 22 Jul 2013 13:35:23 +0200 Subject: [LLVMdev] Inst field in MSP430InstrFormats.td In-Reply-To: <35C8E458-E1F7-4527-812A-B0508BAC5E82@mac.com> References: <35C8E458-E1F7-4527-812A-B0508BAC5E82@mac.com> Message-ID: 2013/7/22 Owen Anderson : > The Inst field is used to specify instruction encodings, which are then used to generate assemblers and disassemblers. I'm not sure offhand, but it's possible that the MSP430 backend doesn't make use of an auto-generated assembler. > > --Owen > Thanks for the explanation. I was expecting fields which are used by non target specific code to be contained in the "Instruction" class in "Target.td" (compare to e.g. DecoderMethod). Is it the need to support different sizes which requires it to be created in the target specific file? Is there any documentation available with regards to which fields are available and what they do? / David From h.bakiras at gmail.com Mon Jul 22 06:17:57 2013 From: h.bakiras at gmail.com (Harris BAKIRAS) Date: Mon, 22 Jul 2013 15:17:57 +0200 Subject: [LLVMdev] error on compiling toy-vm In-Reply-To: References: Message-ID: <51ED3105.7020106@gmail.com> Hello, The tutorial has been updated and is now compatible for LLVM release 3.3. So the issue you are facing should have been fixed. Just update your vmkit to get the latest revision. Regards, Harris Bakiras On 07/21/2013 12:30 PM, m kh wrote: > > Hi all, > > The make command errors out: > > [toyVM ./tools/toyVM]: Generating frame tables initializer for > .build/toyVM-binary.s > [toyVM ./tools/toyVM]: Compiling .build/GenFrametables.cc > [toyVM ./tools/toyVM]: Linking ../../Release/bin/toyVM > clang: error: no such file or directory: > '/home/user/vmkit/Release+Asserts/lib/Release/lib/libInlineMMTk.a' > make[2]: *** [../../Release/bin/toyVM] Error 1 > make[2]: Leaving directory `/home/user/vmkit/www/tuto/toy-vm2/tools/toyVM' > make[1]: *** [all-subs] Error 2 > make[1]: Leaving directory `/home/user/vmkit/www/tuto/toy-vm2/tools' > make: *** [all-subs] Error 2 > > > > Best regards, > Mkh > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From h.bakiras at gmail.com Mon Jul 22 07:07:52 2013 From: h.bakiras at gmail.com (Harris BAKIRAS) Date: Mon, 22 Jul 2013 16:07:52 +0200 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: References: <51E93D97.10103@gmail.com> <51E94DD2.4060802@gmail.com> Message-ID: <51ED3CB8.8030001@gmail.com> Hello Kumar, Unfortunately we never experienced on ARM architecture and we are not planning to port VMKit on ARM for the moment. Regards, Harris Bakiras On 07/19/2013 05:50 PM, Kumar Sukhani wrote: > I am working on a project to port JRuby on Embedded systems. JRuby > converts Ruby code to bytecode which is executed by any JVM. For this > project I am testing performance of JRuby with various available JVMs. > I have chosen ARM architecture. > Does vmkit support ARM architecture? > > On Fri, Jul 19, 2013 at 8:01 PM, Harris BAKIRAS > wrote: > > I don't know how JRuby works, maybe it uses some new feature that > GNU Classpath does not provide. > > VMKit's openJDK version is unstable on 64 bits since package > version 6b27. > You can still use it for very small programs which does not need > GC but that's all. > > It works fine on 32 bits. > So you can try it on 32 bits or revert your java version to a > previous one (< than 6b27) to test it on 64 bits. > > We are working on fixing the 64 bits issue as soon as possible. > > Harris Bakiras > > On 07/19/2013 03:47 PM, Kumar Sukhani wrote: >> Hi Harris Bakiras, >> Thanks for reply. It working now. >> Actually I wanted to try vmkit VM to run jruby codes. >> >> vmkit is able to run Java program, but when I try to run JRuby >> code then I get following error - >> >> root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb >> >> Platform.java:39:in `getPackageName': >> java.lang.NullPointerException >> >> from ConstantSet.java:84:in `getEnumClass' >> >> from ConstantSet.java:60:in `getConstantSet' >> >> from ConstantResolver.java:181:in `getConstants' >> >> from ConstantResolver.java:102:in `getConstant' >> >> from ConstantResolver.java:146:in `intValue' >> >> from OpenFlags.java:28:in `value' >> >> from RubyFile.java:254:in `createFileClass' >> >> from Ruby.java:1273:in `initCore' >> >> from Ruby.java:1101:in `bootstrap' >> >> from Ruby.java:1079:in `init' >> >> from Ruby.java:179:in `newInstance' >> >> from Main.java:217:in `run' >> >> from Main.java:128:in `run' >> >> from Main.java:97:in `main' >> >> >> Can you tell me what will be the issue ? >> Vmkit doesn't work with OpenJDK ? >> >> On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS >> > wrote: >> >> Hi Kumar, >> >> There is an error on your configuration line, you should >> provide the path to llvm-config binary instead of configure file. >> Assuming that you compiled llvm in release mode, the >> llvm-config binary is located in : >> >> YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config >> >> Try to change the -with-llvm-config-path option and it will >> compile. >> >> Harris Bakiras >> >> On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >>> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed >>> the steps giving here >>> [1]. >>> but when I run ./configure I am getting following error- >>> >>> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# >>> ./configure >>> -with-llvm-config-path=../llvm-3.3.src/configure >>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >>> >>> checking build system type... x86_64-unknown-linux-gnu >>> >>> checking host system type... x86_64-unknown-linux-gnu >>> >>> checking target system type... x86_64-unknown-linux-gnu >>> >>> checking type of operating system we're going to >>> host on... Linux >>> >>> configure: error: missing argument to --bindir >>> >>> configure: error: Cannot find (or not executable) >>> >>> >>> I tried searching it online but didn't got any similar issue. >>> >>> [1] http://vmkit.llvm.org/get_started.html >>> >>> -- >>> Kumar Sukhani >>> +919579650250 >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> -- >> Kumar Sukhani >> +919579650250 > > > > > -- > Kumar Sukhani > +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Mon Jul 22 07:27:48 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 22 Jul 2013 07:27:48 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> Message-ID: <51ED4164.9000008@grosser.es> On 07/22/2013 01:46 AM, Star Tan wrote: > At 2013-07-22 12:16:53,"Tobias Grosser" wrote: >> I propose two more patches: >> >> 1) Transform the INVALID macro into function calls, that format >> the text and that set LastFailure. > Translating the INVALID macro into function calls would complicate the operations for different counters. > For example, currently users can add a new counter by simply adding the following line: > BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form"); > But if we translate INVALID macro into function calls, then users has to add a function for this counter: > void INVLIAD_SimpleLoop (...). \ ^^ No uppercase in function names. > This is because we cannot use the following macro combination in function calls: > if (!Context.Verifying) \ > ++Bad##NAME##ForScop; > So, I do not think it is necessary to translate the INVALID macro into function calls. > Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ... ? In that case, I could provide a small patch file for this purpose -:) I think it would still be nice to get rid of this macro. We could probably have a default function that takes an enum to report different errors in the reportInvalid(enum errorKind) style. And then several others that would allow more complex formatting (e.g. reportInvalidAlias(AliasSet)). Especially the code after 'isMustAlias()' would be nice to move out of the main scop detection. However, this issue is not directly related to the speedup work, so you are welcome to skip it for now. (Btw. thanks for not blindly following my suggestions!) >> 2) Add checks at the beginning of those function calls and >> continue only if LogErrors is set > Those invalid log strings are used for two separate cases: > 1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement: > if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE)) > That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it. > 2) The other case is for the variable "LastFailure", which is used only in GraphPrinter. Currently string operations for "LastFailure" always execute in normal cases. My idea is to put such string operations under the condition of "GraphPrinter" mode. For example, I would like to translate the "INVALID" macro into: > #define INVALID(NAME, MESSAGE) \ > do { \ > if (GraphViewMode) { \ > std::string Buf; \ > raw_string_ostream fmt(Buf); \ > fmt << MESSAGE; \ > fmt.flush(); \ > LastFailure = Buf; \ > } \ > DEBUG(dbgs() << MESSAGE); \ > DEBUG(dbgs() << "\n"); \ > assert(!Context.Verifying &&#NAME); \ > if (!Context.Verifying) \ > ++Bad##NAME##ForScop; \ > } while (0) Looks good. > As you have suggested, we can construct the condition GraphViewMode with "-polly-show", "-polly-show-only", "polly-dot" and "polly-dot-only". However, I see all these options are defined as "static" variables in lib/RegisterPasses.cpp. Do you think I should translate these local variables into global variables or should I define another option like "-polly-dot-scop" in ScopDetection.cpp? You can define a new option -polly-detect-collect-errors that enables the error tracking. Adding cl::location to this option allows you to store the option value externally. You can use this to automatically set this option, in case in lib/RegisterPasses.cpp in case -polly-show, -polly-show-only, ... have been set. Cheers, Tobias From kumarsukhani at gmail.com Mon Jul 22 07:46:05 2013 From: kumarsukhani at gmail.com (Kumar Sukhani) Date: Mon, 22 Jul 2013 20:16:05 +0530 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: <51ED3CB8.8030001@gmail.com> References: <51E93D97.10103@gmail.com> <51E94DD2.4060802@gmail.com> <51ED3CB8.8030001@gmail.com> Message-ID: here its mentioned that its portable on ARM. So simply cross-compiling will work? On Mon, Jul 22, 2013 at 7:37 PM, Harris BAKIRAS wrote: > Hello Kumar, > > Unfortunately we never experienced on ARM architecture and we are not > planning to port VMKit on ARM for the moment. > > Regards, > > Harris Bakiras > > On 07/19/2013 05:50 PM, Kumar Sukhani wrote: > > I am working on a project to port JRuby on Embedded systems. JRuby > converts Ruby code to bytecode which is executed by any JVM. For this > project I am testing performance of JRuby with various available JVMs. I > have chosen ARM architecture. > Does vmkit support ARM architecture? > > On Fri, Jul 19, 2013 at 8:01 PM, Harris BAKIRAS wrote: > >> I don't know how JRuby works, maybe it uses some new feature that GNU >> Classpath does not provide. >> >> VMKit's openJDK version is unstable on 64 bits since package version 6b27. >> You can still use it for very small programs which does not need GC but >> that's all. >> >> It works fine on 32 bits. >> So you can try it on 32 bits or revert your java version to a previous >> one (< than 6b27) to test it on 64 bits. >> >> We are working on fixing the 64 bits issue as soon as possible. >> >> Harris Bakiras >> >> On 07/19/2013 03:47 PM, Kumar Sukhani wrote: >> >> Hi Harris Bakiras, >> Thanks for reply. It working now. >> Actually I wanted to try vmkit VM to run jruby codes. >> >> vmkit is able to run Java program, but when I try to run JRuby code >> then I get following error - >> >> root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb >>> >>> Platform.java:39:in `getPackageName': java.lang.NullPointerException >>> >>> from ConstantSet.java:84:in `getEnumClass' >>> >>> from ConstantSet.java:60:in `getConstantSet' >>> >>> from ConstantResolver.java:181:in `getConstants' >>> >>> from ConstantResolver.java:102:in `getConstant' >>> >>> from ConstantResolver.java:146:in `intValue' >>> >>> from OpenFlags.java:28:in `value' >>> >>> from RubyFile.java:254:in `createFileClass' >>> >>> from Ruby.java:1273:in `initCore' >>> >>> from Ruby.java:1101:in `bootstrap' >>> >>> from Ruby.java:1079:in `init' >>> >>> from Ruby.java:179:in `newInstance' >>> >>> from Main.java:217:in `run' >>> >>> from Main.java:128:in `run' >>> >>> from Main.java:97:in `main' >>> >>> >> Can you tell me what will be the issue ? >> Vmkit doesn't work with OpenJDK ? >> >> On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS wrote: >> >>> Hi Kumar, >>> >>> There is an error on your configuration line, you should provide the >>> path to llvm-config binary instead of configure file. >>> Assuming that you compiled llvm in release mode, the llvm-config binary >>> is located in : >>> >>> YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config >>> >>> Try to change the -with-llvm-config-path option and it will compile. >>> >>> Harris Bakiras >>> >>> On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >>> >>> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps >>> giving here [1]. >>> but when I run ./configure I am getting following error- >>> >>> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >>>>> -with-llvm-config-path=../llvm-3.3.src/configure >>>>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>>>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >>>> >>>> checking build system type... x86_64-unknown-linux-gnu >>>> >>>> checking host system type... x86_64-unknown-linux-gnu >>>> >>>> checking target system type... x86_64-unknown-linux-gnu >>>> >>>> checking type of operating system we're going to host on... Linux >>>> >>>> configure: error: missing argument to --bindir >>>> >>>> configure: error: Cannot find (or not executable) >>>> >>>> >>> I tried searching it online but didn't got any similar issue. >>> >>> [1] http://vmkit.llvm.org/get_started.html >>> >>> -- >>> Kumar Sukhani >>> +919579650250 >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> >> -- >> Kumar Sukhani >> +919579650250 >> >> >> > > > -- > Kumar Sukhani > +919579650250 > > > -- Kumar Sukhani +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.thomas00 at gmail.com Mon Jul 22 08:15:40 2013 From: gael.thomas00 at gmail.com (=?ISO-8859-1?Q?Ga=EBl_Thomas?=) Date: Mon, 22 Jul 2013 17:15:40 +0200 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: References: <51E93D97.10103@gmail.com> <51E94DD2.4060802@gmail.com> <51ED3CB8.8030001@gmail.com> Message-ID: Hi Kumar, It's a mistake, we will correct that, we haven't tested vmkit on arm. As LLVM supports arm, by cross-compiling vmkit in arm/linux, it could work, but you will probably have to adapt some few parts of the code. If you plan to make the port, feel free to send us a lot of patches :) Gaël Le 22 juil. 2013 17:09, "Kumar Sukhani" a écrit : > here its mentioned that its portable on ARM. So > simply cross-compiling will work? > > On Mon, Jul 22, 2013 at 7:37 PM, Harris BAKIRAS wrote: > >> Hello Kumar, >> >> Unfortunately we never experienced on ARM architecture and we are not >> planning to port VMKit on ARM for the moment. >> >> Regards, >> >> Harris Bakiras >> >> On 07/19/2013 05:50 PM, Kumar Sukhani wrote: >> >> I am working on a project to port JRuby on Embedded systems. JRuby >> converts Ruby code to bytecode which is executed by any JVM. For this >> project I am testing performance of JRuby with various available JVMs. I >> have chosen ARM architecture. >> Does vmkit support ARM architecture? >> >> On Fri, Jul 19, 2013 at 8:01 PM, Harris BAKIRAS wrote: >> >>> I don't know how JRuby works, maybe it uses some new feature that GNU >>> Classpath does not provide. >>> >>> VMKit's openJDK version is unstable on 64 bits since package version >>> 6b27. >>> You can still use it for very small programs which does not need GC but >>> that's all. >>> >>> It works fine on 32 bits. >>> So you can try it on 32 bits or revert your java version to a previous >>> one (< than 6b27) to test it on 64 bits. >>> >>> We are working on fixing the 64 bits issue as soon as possible. >>> >>> Harris Bakiras >>> >>> On 07/19/2013 03:47 PM, Kumar Sukhani wrote: >>> >>> Hi Harris Bakiras, >>> Thanks for reply. It working now. >>> Actually I wanted to try vmkit VM to run jruby codes. >>> >>> vmkit is able to run Java program, but when I try to run JRuby code >>> then I get following error - >>> >>> root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb >>>> >>>> Platform.java:39:in `getPackageName': java.lang.NullPointerException >>>> >>>> from ConstantSet.java:84:in `getEnumClass' >>>> >>>> from ConstantSet.java:60:in `getConstantSet' >>>> >>>> from ConstantResolver.java:181:in `getConstants' >>>> >>>> from ConstantResolver.java:102:in `getConstant' >>>> >>>> from ConstantResolver.java:146:in `intValue' >>>> >>>> from OpenFlags.java:28:in `value' >>>> >>>> from RubyFile.java:254:in `createFileClass' >>>> >>>> from Ruby.java:1273:in `initCore' >>>> >>>> from Ruby.java:1101:in `bootstrap' >>>> >>>> from Ruby.java:1079:in `init' >>>> >>>> from Ruby.java:179:in `newInstance' >>>> >>>> from Main.java:217:in `run' >>>> >>>> from Main.java:128:in `run' >>>> >>>> from Main.java:97:in `main' >>>> >>>> >>> Can you tell me what will be the issue ? >>> Vmkit doesn't work with OpenJDK ? >>> >>> On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS wrote: >>> >>>> Hi Kumar, >>>> >>>> There is an error on your configuration line, you should provide the >>>> path to llvm-config binary instead of configure file. >>>> Assuming that you compiled llvm in release mode, the llvm-config binary >>>> is located in : >>>> >>>> YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config >>>> >>>> Try to change the -with-llvm-config-path option and it will compile. >>>> >>>> Harris Bakiras >>>> >>>> On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >>>> >>>> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the steps >>>> giving here [1]. >>>> but when I run ./configure I am getting following error- >>>> >>>> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >>>>>> -with-llvm-config-path=../llvm-3.3.src/configure >>>>>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>>>>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >>>>> >>>>> checking build system type... x86_64-unknown-linux-gnu >>>>> >>>>> checking host system type... x86_64-unknown-linux-gnu >>>>> >>>>> checking target system type... x86_64-unknown-linux-gnu >>>>> >>>>> checking type of operating system we're going to host on... Linux >>>>> >>>>> configure: error: missing argument to --bindir >>>>> >>>>> configure: error: Cannot find (or not executable) >>>>> >>>>> >>>> I tried searching it online but didn't got any similar issue. >>>> >>>> [1] http://vmkit.llvm.org/get_started.html >>>> >>>> -- >>>> Kumar Sukhani >>>> +919579650250 >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> >>> -- >>> Kumar Sukhani >>> +919579650250 >>> >>> >>> >> >> >> -- >> Kumar Sukhani >> +919579650250 >> >> >> > > > -- > Kumar Sukhani > +919579650250 > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verena at codeplay.com Mon Jul 22 09:18:53 2013 From: verena at codeplay.com (Verena Beckham) Date: Mon, 22 Jul 2013 17:18:53 +0100 Subject: [LLVMdev] Predication bug in AggressiveAntiDepBreaker? Message-ID: <51ED5B6D.1060400@codeplay.com> Hi, I wondered whether the AggressiveAntiDepBreaker can properly handle predicated instructions. At the end of PrescanInstruction the "DefIndices" array is updated with the destination register without checking whether the instruction is predicated. That shortens the live range: Later on, in HandleLastUse we check whether the register IsLive, which considers only "KillIndices" and "DefIndices", and therefore returns False for the interval between the predicated instruction and any non-predicated read before it. So that read is considered the last use. In my example this leads to a register not being fully renamed everywhere. I don't think a predicated write should count as a define. The attached patch fixes this test case. Or am I missing something target dependent here? -- Verena Beckham Vice President Engineering Codeplay Software Ltd 45 York Place, Edinburgh, EH1 3HP Tel: 0131 466 0503 Fax: 0131 557 6600 Website: http://www.codeplay.com This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it,or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated. As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments. Company registered in England and Wales, number: 04567874 Registered office: 81 Linkfield Street, Redhill RH1 6BY -------------- next part -------------- Index: AggressiveAntiDepBreaker.cpp =================================================================== --- AggressiveAntiDepBreaker.cpp (revision 186828) +++ AggressiveAntiDepBreaker.cpp (working copy) @@ -399,7 +399,7 @@ unsigned Reg = MO.getReg(); if (Reg == 0) continue; // Ignore KILLs and passthru registers for liveness... - if (MI->isKill() || (PassthruRegs.count(Reg) != 0)) + if (MI->isKill() || (PassthruRegs.count(Reg) != 0) || TII->isPredicated(MI)) continue; // Update def for Reg and aliases. From swlin at post.harvard.edu Mon Jul 22 10:19:04 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Mon, 22 Jul 2013 10:19:04 -0700 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? Message-ID: Hi, I noticed that ConstantFP::get automatically returns the appropriately types Constant depending on the LLVM type passed in (i.e. if called with a vector, it returns a splat vector with the given constant). Is there any simple way to do the inverse of this function? i.e., given a llvm::Value, check whether it is either a scalar of the given constant value or a splat vector with the given constant value? I can't seem to find any, and it doesn't look like the pattern matching interface provides something similar to this either. If this doesn't exist, then I propose adding static versions of all the various ConstantFoo::isBar() functions, which take a value as a parameter and check that the value is of a constant of the appropriate type and value (checking for vectors matching the predicate in the vector case). For example: static bool ConstantFP::isExactlyValue(Value *V, double D); would return true is V is ConstantFP, a splat ConstantVector, or a ConstantDataVector with the appropriate type. Similarly, static bool ConstantFP::isZero(Value *V); would return true if V is a ConstantFP with zero of either sign, a ConstantVector or ConstantDataVector with all zeros of either sign, or a zero initializer... Anyone have any thoughts, and/or can point me to somewhere where this kind of thing is already implemented? Thanks, Stephen From hfinkel at anl.gov Mon Jul 22 10:44:19 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Mon, 22 Jul 2013 12:44:19 -0500 (CDT) Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: Message-ID: <182049485.13457515.1374515059786.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > Hi, > > I noticed that ConstantFP::get automatically returns the > appropriately > types Constant depending on the LLVM type passed in (i.e. if called > with a vector, it returns a splat vector with the given constant). > > Is there any simple way to do the inverse of this function? i.e., > given a llvm::Value, check whether it is either a scalar of the given > constant value or a splat vector with the given constant value? I > can't seem to find any, and it doesn't look like the pattern matching > interface provides something similar to this either. > > If this doesn't exist, then I propose adding static versions of all > the various ConstantFoo::isBar() functions, which take a value as a > parameter and check that the value is of a constant of the > appropriate > type and value (checking for vectors matching the predicate in the > vector case). > > For example: > > static bool ConstantFP::isExactlyValue(Value *V, double D); You can currently do this: if (const ConstantVector *CV = dyn_cast(X)) if (Constant *Splat = CV->getSplatValue()) // Now you know that Splat is a splatted value, so check it for something. -Hal > > would return true is V is ConstantFP, a splat ConstantVector, or a > ConstantDataVector with the appropriate type. Similarly, > > static bool ConstantFP::isZero(Value *V); > > would return true if V is a ConstantFP with zero of either sign, a > ConstantVector or ConstantDataVector with all zeros of either sign, > or > a zero initializer... > > Anyone have any thoughts, and/or can point me to somewhere where this > kind of thing is already implemented? > > Thanks, > Stephen > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From swlin at post.harvard.edu Mon Jul 22 11:12:43 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Mon, 22 Jul 2013 11:12:43 -0700 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: <182049485.13457515.1374515059786.JavaMail.root@alcf.anl.gov> References: <182049485.13457515.1374515059786.JavaMail.root@alcf.anl.gov> Message-ID: > You can currently do this: > if (const ConstantVector *CV = dyn_cast(X)) > if (Constant *Splat = CV->getSplatValue()) > // Now you know that Splat is a splatted value, so check it for something. > Yes, but that's only if you want to check the vector case only; I would like to check for either the scalar constant or the splat vector, in the same way that ConstantFP::get creates either the scalar constant or splat vector. Also, that doesn't check for ConstantDataVector, either, which is the usual canonical form. Basically, I want the alternative to the following: static bool IsExactlyValueConstantFP(Value *V, double D) { ConstantFP *CFP; ConstantDataVector *CDV; ConstantVector *CV; return ((CFP = dyn_cast(V)) && CFP->isExactlyValue(D)) || ((CDV = dyn_cast(V)) && (CFP = dyn_cast_or_null(CDV->getSplatValue())) && CFP->isExactlyValue(D)) || ((CV = dyn_cast(V)) && (CFP = dyn_cast_or_null(CV->getSplatValue())) && CFP->isExactlyValue(D)); } Stephen From bob.wilson at apple.com Mon Jul 22 11:36:54 2013 From: bob.wilson at apple.com (Bob Wilson) Date: Mon, 22 Jul 2013 11:36:54 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> Message-ID: <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> On Jul 21, 2013, at 8:00 PM, Eric Christopher wrote: > On Sat, Jul 20, 2013 at 9:15 PM, Chris Lattner wrote: >> Sorry, just getting caught up on an old thread. I haven't been involved in discussions of this. >> >> On Jul 17, 2013, at 8:53 AM, Bob Wilson wrote: >>> First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. >> >> Of all of the proposals discussed, I like this the best: >> >> 1) This is a really simple extension of what we already have. >> >> 2) The backend providing a set of enumerations for the classes of diagnostics it produces doesn't tie it to clang, and doesn't make it language specific. Clients should be able to completely ignore the enum if they want the current (unclassified) behavior, and if an unknown enum value comes through, it is easy to handle. >> >> 3) I don't see how something like the stack size diagnostic can be implemented by clang calling into the backend. First, the MachineFunction (and thus, MachineFrameInfo) is a transient datastructure used by the backend when a function is compiled. There is nothing persistent for clang to query. Second, clang would have to know about all of the LLVM IR functions generated, which is possible, but impractical to track for things like thunks and other implicitly generated entrypoints. >> >> What is the specific concern with this approach? I don't see how this couples the backend to the frontend or causes layering violation problems. >> > > I've not talked with Chandler about this, but to sketch out the way > I'd do it (which is similar): > > Have the backend vend diagnostics, this can be done either with a set > of enums and messages like you mentioned, or just have a message and > location struct ala: > > struct Msg { > const char *Message; > Location Loc; > }; > > that the consumer of the message can use via a handler. When the consumer is clang, it's important that we have diagnostic groups to control the warnings, so the enum is important. We don't want to be comparing the message strings to decide whether a particular warning is an instance of -Wstack-size (for example). > > Alternately a handler (and we should have a default handler) can be > passed in from the printer of the message (the frontend in the case > provided) and it can be called on the error message. Absolutely this > should be done via the LLVMContext to deal with the case of parallel > function passes. > > class Handler { > void printDiagnostic(const char *Message, Location Loc); > }; > > (Note that I didn't say this was a fleshed out design ;) I think I > prefer the latter to the former and we'd just need an "diagnostic > callback handler" on the context. Though we would need to keep a set > of diagnostics that the backend handles. That said, that it provides > diagnostics based on its input language seems to make the most sense. > It can use the location metadata if it has it to produce a location, > otherwise you get the function, etc. in some sort of nicely degraded > quality. This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it? > > I think this scheme could also work as a way of dealing with the > "Optimization Diary" sort of use that Hal is envisioning as well. Yes, I agree. > > Keeping the separation of concerns around where the front end handles > diagnostics on what we'll call "source locations" is pretty important, > however, I agree that not every warning can be expressed this way, > i.e. the stack size diagnostic. However, leaving the printing up to > the front end is the best way to deal with this and a generic > diagnostic engine would probably help for things like llc/opt where > the backend just deals with its input language - IR. > > The existing inline asm diagnostics are ... problematic and it would > definitely be nice to get a generic interface for them. Though they're > actually separated into two separate cases where, I think, we end up > with our confusion: > > a) Front end diagnostics - This is an area that needs some work to be > decent, but it involves the front end querying the back end for things > like register size, valid immediates, etc and should be implemented by > the front end with an expanded set of target queries. We could use > this as a way to solidify the backend api for the MS inline asm > support as well and use some of that when parsing GNU style inline > asm. Yes, I agree about this, too. As an example, we had a discussion a few months ago about warning about over-aligned stack variables when dynamic stack realignment is disabled, and we agreed to move that warning into the frontend. I keep poking at the frontend team every once in a while to get some help implementing that, but I haven't forgotten it. > > b) Back end diagnostics - This is the stuff that the front end has no > hope of diagnosing. i.e. "ran out of registers", or "can't figure out > how to split this up into this kind of vector register". The latter > has always been a bit funny and I'm always unhappy with it, but I > don't have any better ideas. A unified scheme of communicating "help > help I'm being oppressed by the front end" in the backend would be, at > the very least, a step forward. > > Thoughts? I don't have any better ideas for this. At least if we generalize the existing inline asm diagnostic handler, it will make it less of a special case. From tom at stellard.net Mon Jul 22 11:50:53 2013 From: tom at stellard.net (Tom Stellard) Date: Mon, 22 Jul 2013 11:50:53 -0700 Subject: [LLVMdev] Questions about MachineScheduler Message-ID: <20130722185053.GB1714@L7-CNU1252LKR-172027226155.amd.com> Hi, I'm working on defining a SchedMachineModel for the Southern Islands family of GPUs, and I have two questions related to the MachineScheduler. 1. I have a resource that can process 15 instructions at the same time. In the TableGen definitions, should I do: def HWVMEM : ProcResource<15>; or let BufferSize = 15 in { def HWVMEM : ProcResource<1>; } 2. Southern Islands has 256 registers, but there is a significant performance penalty if you use more than a certain amount. Do any of the MachineSchedulers support switching into an 'optimize for register pressure mode' once it detects register pressure above a certain limit? Thanks, Tom From atrick at apple.com Mon Jul 22 12:01:16 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 22 Jul 2013 12:01:16 -0700 Subject: [LLVMdev] ScalarEvolution, HowManyLessThans question for step > 1 In-Reply-To: <239201cdc42c$0af5d2e0$20e178a0$@codeaurora.org> References: <239201cdc42c$0af5d2e0$20e178a0$@codeaurora.org> Message-ID: On Nov 16, 2012, at 10:56 AM, Brendon Cahoon wrote: > Hi, > > I have a question about some code in ScalarEvolution.cpp, in the function HowManyLessThans. Specifically, this function returns CouldNotCompute if the iteration step is greater than one even when the NoWrap flags are set (if the step goes past the limit and wraps). Here’s the comment: > > } else if (isKnownPositive(Step)) { > // Test whether a positive iteration can step past the limit > // value and past the maximum value for its type in a single step. > // Note that it's not sufficient to check NoWrap here, because even > // though the value after a wrap is undefined, it's not undefined > // behavior, so if wrap does occur, the loop could either terminate or > // loop infinitely, but in either case, the loop is guaranteed to > // iterate at least until the iteration where the wrapping occurs. > > I have no doubt that there is a good reason for this code to work this way (there is a specific check in trip-count9.ll). I’m just trying to understand it better. I’m assuming that when the previous version of this code checked the NoWrap flag, that it caused a problem with some optimization pass? If so, does anyone recall the pass. > > I’m also curious if the –f[no]-strict-overflow flag has any impact on the assumptions? Or, could the function check NoWrap for the unsigned case only? > > The motivating example for this question occurs because we’re not unrolling loops with a run-time trip count when the loop induction variable is a pointer and the loop end check is another pointer. For example, > > void ex( int n, int* a, int* c ) { > int b = 1; int *ae = a + n; > while ( a < ae) { > c ++; b += *c; *a++ = b; > } > } > > Thanks, > Brendon Someone pointed me to this message that lacked a response. Sorry for not reading llvmdev that week in the distant past, but for the record... ‘a’ can exceed ‘ae’ without undefined behavior if the pointers were somehow misaligned. Then hypothetically ‘a’ can step off the end of the address space (unsigned wrap). The key to understanding the problem is that SCEV NoWrap flag guarantees that ‘a’ can never surpass its original value, regardless of signed or unsigned wrapping. It’s doesn’t mean that signed/unsigned wrap can’t occur. SCEV’s thinking here is that the loop may continue to iterate at this point, modifying program state in some valid way, then later terminate via some other mechanism. Then there’s the whole issue of what undefined behavior means in LLVM. Are we beholden to execute all well-defined loop iterations even if we can prove that undefined behavior will eventually occur? What SCEV doesn’t realize is: comparing a < ae is already undefined if ‘a’ just unsigned-wrapped since it’s a pointer comparison and the step was positive (pointers unsigned-wrap all the time with negative steps). It also can’t see that dereferencing ‘a’ would result in undefined behavior if a positive step led to unsigned wrap (I guess I’m assuming that a single object can’t span more than half the address space). SCEV also has no way to communicate to a client, like unrolling, that if the loop test is the only possible way for the loop to terminate in a well-defined way, then the trip count can be guaranteed. So, I *do* think we could handle this case. But to do it we need to either add intelligence to SCEV (which is tricky to reason about) or add some knowledge about the whole loop when asking for the trip count (which is tricky to engineer). It wouldn’t hurt to file a bug. I don’t see a lot of people hacking on SCEV, but you never know. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Mon Jul 22 12:15:53 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Mon, 22 Jul 2013 14:15:53 -0500 (CDT) Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: Message-ID: <530329828.13482242.1374520553113.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > On Sat, Jul 20, 2013 at 9:15 PM, Chris Lattner > wrote: > > Sorry, just getting caught up on an old thread. I haven't been > > involved in discussions of this. > > > > On Jul 17, 2013, at 8:53 AM, Bob Wilson > > wrote: > >> First, let me try to clarify my proposal, in case there was any > >> confusion about that. LLVMContext already has a hook for > >> diagnostics, setInlineAsmDiagnosticHandler() et al. I was > >> suggesting that we rename those interfaces to be more generic, > >> add a simple enumeration of whatever diagnostics can be produced > >> from the backend, and add support in clang for mapping those > >> enumeration values to the corresponding clang diagnostics. This > >> would be a small amount of work and would also be consistent with > >> everything you wrote above about reusing the standard and > >> existing machinery for diagnostics in clang. > > > > Of all of the proposals discussed, I like this the best: > > > > 1) This is a really simple extension of what we already have. > > > > 2) The backend providing a set of enumerations for the classes of > > diagnostics it produces doesn't tie it to clang, and doesn't make > > it language specific. Clients should be able to completely ignore > > the enum if they want the current (unclassified) behavior, and if > > an unknown enum value comes through, it is easy to handle. > > > > 3) I don't see how something like the stack size diagnostic can be > > implemented by clang calling into the backend. First, the > > MachineFunction (and thus, MachineFrameInfo) is a transient > > datastructure used by the backend when a function is compiled. > > There is nothing persistent for clang to query. Second, clang > > would have to know about all of the LLVM IR functions generated, > > which is possible, but impractical to track for things like thunks > > and other implicitly generated entrypoints. > > > > What is the specific concern with this approach? I don't see how > > this couples the backend to the frontend or causes layering > > violation problems. > > > > I've not talked with Chandler about this, but to sketch out the way > I'd do it (which is similar): > > Have the backend vend diagnostics, this can be done either with a set > of enums and messages like you mentioned, or just have a message and > location struct ala: > > struct Msg { > const char *Message; > Location Loc; > }; > > that the consumer of the message can use via a handler. > > Alternately a handler (and we should have a default handler) can be > passed in from the printer of the message (the frontend in the case > provided) and it can be called on the error message. Absolutely this > should be done via the LLVMContext to deal with the case of parallel > function passes. > > class Handler { > void printDiagnostic(const char *Message, Location Loc); > }; > > (Note that I didn't say this was a fleshed out design ;) I think I > prefer the latter to the former and we'd just need an "diagnostic > callback handler" on the context. Though we would need to keep a set > of diagnostics that the backend handles. That said, that it provides > diagnostics based on its input language seems to make the most sense. > It can use the location metadata if it has it to produce a location, > otherwise you get the function, etc. in some sort of nicely degraded > quality. > > I think this scheme could also work as a way of dealing with the > "Optimization Diary" sort of use that Hal is envisioning as well. I like "Optimization Diary" :) Obviously, we sometimes loose debug information on variables while optimizing, and so the trick is to make it degrade nicely. Nevertheless, I think that we can often be more expressive than just using a single location for everything. How about something like this: - The message string is text but a single kind of markup is allowed: , for example: "We cannot vectorize because is an unfriendly variable" (where the first will be replaced by text derived from a DIScope and the second from a DIVariable). - The structure is this: struct Msg { const char *Message; Function *F; // If nothing else, we can extract a useful name from here, hopefully. SmallVector DIs; // Should be DIDescriptor* ? }; Then, in the backend, we can look for a DbgValueInst associated with the variable that we want (similar for scopes), and push a DIScope, DIVariable, etc. onto the array, one for every in the message string. The frontend can then render these references in whatever way seems most appropriate (which may include things like making them into hyperlinks, doing some kind of source-code highlighting, etc.). Thanks again, Hal > > Keeping the separation of concerns around where the front end handles > diagnostics on what we'll call "source locations" is pretty > important, > however, I agree that not every warning can be expressed this way, > i.e. the stack size diagnostic. However, leaving the printing up to > the front end is the best way to deal with this and a generic > diagnostic engine would probably help for things like llc/opt where > the backend just deals with its input language - IR. > > The existing inline asm diagnostics are ... problematic and it would > definitely be nice to get a generic interface for them. Though > they're > actually separated into two separate cases where, I think, we end up > with our confusion: > > a) Front end diagnostics - This is an area that needs some work to be > decent, but it involves the front end querying the back end for > things > like register size, valid immediates, etc and should be implemented > by > the front end with an expanded set of target queries. We could use > this as a way to solidify the backend api for the MS inline asm > support as well and use some of that when parsing GNU style inline > asm. > > b) Back end diagnostics - This is the stuff that the front end has no > hope of diagnosing. i.e. "ran out of registers", or "can't figure out > how to split this up into this kind of vector register". The latter > has always been a bit funny and I'm always unhappy with it, but I > don't have any better ideas. A unified scheme of communicating "help > help I'm being oppressed by the front end" in the backend would be, > at > the very least, a step forward. > > Thoughts? > > -eric > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From echristo at gmail.com Mon Jul 22 13:26:52 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 22 Jul 2013 13:26:52 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: >> Have the backend vend diagnostics, this can be done either with a set >> of enums and messages like you mentioned, or just have a message and >> location struct ala: >> >> struct Msg { >> const char *Message; >> Location Loc; >> }; >> >> that the consumer of the message can use via a handler. > > When the consumer is clang, it's important that we have diagnostic groups to control the warnings, so the enum is important. We don't want to be comparing the message strings to decide whether a particular warning is an instance of -Wstack-size (for example). > I think, in this case, that I'd want the registration function to handle the "give me these sorts of warnings" rather than it be a part of the message. I.e. you'd register for each class of warnings out of the backend maybe? > > This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it? > Pretty close yeah. >> >> a) Front end diagnostics - This is an area that needs some work to be >> decent, but it involves the front end querying the back end for things >> like register size, valid immediates, etc and should be implemented by >> the front end with an expanded set of target queries. We could use >> this as a way to solidify the backend api for the MS inline asm >> support as well and use some of that when parsing GNU style inline >> asm. > > Yes, I agree about this, too. As an example, we had a discussion a few months ago about warning about over-aligned stack variables when dynamic stack realignment is disabled, and we agreed to move that warning into the frontend. I keep poking at the frontend team every once in a while to get some help implementing that, but I haven't forgotten it. > *nod* This should be pretty easy to implement out of the front end at a first thought - "if we're creating an alloca with an alignment greater than the target and we've turned off dynamic realignment then warn". The trick is getting the backend data for such things etc. >> >> b) Back end diagnostics - This is the stuff that the front end has no >> hope of diagnosing. i.e. "ran out of registers", or "can't figure out >> how to split this up into this kind of vector register". The latter >> has always been a bit funny and I'm always unhappy with it, but I >> don't have any better ideas. A unified scheme of communicating "help >> help I'm being oppressed by the front end" in the backend would be, at >> the very least, a step forward. >> >> Thoughts? > > I don't have any better ideas for this. At least if we generalize the existing inline asm diagnostic handler, it will make it less of a special case. Ah, the thoughts was an "in general", but if you'd had ideas I'd have totally been up for them :) -eric From g.franceschetti at vidya.it Mon Jul 22 13:51:01 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti) Date: Mon, 22 Jul 2013 22:51:01 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <87y58zidax.fsf@wanadoo.es> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> Message-ID: <51ED9B35.4010304@vidya.it> Hi all, yes, I do not know python and I installed it only for being able to build LLVM. Now I have installed version 2.7. I tried with codeblock project generation, but I'm still getting errors. So I moved to visual studio as per "getting started" guide. I run the command: cmake -G "Visual Studio 11" ..\llvm from my build folder. It lists a lot of file not found during the execution, but at the end it does create th visual studio projects. Based on the web guide, it should be successful. First question, is it really? Then, I open visual studio and run the solution compilation. But, after a long time, I got a lot of errors stating that it is not possible to find the stdbool.h file + a few others. Example: error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such file or directory (\llvm\projects\compiler-rt\lib\floatuntisf.c) \llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 error C2061: errore di sintassi: identificatore '__attribute__' (\llvm\projects\compiler-rt\lib\int_util.c) \llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 error C2059: errore di sintassi: ';' (\llvm\projects\compiler-rt\lib\int_util.c) \llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 error C2182: 'noreturn': utilizzo non valido del tipo 'void' (\llvm\projects\compiler-rt\lib\int_util.c) \llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such file or directory (\llvm\projects\compiler-rt\lib\int_util.c) \llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 What could it be? Any help is appreciated, Giorgio Il 22/07/2013 03.38, Óscar Fuentes ha scritto: > Reid Kleckner writes: > >> My initial impression was that still probably nobody uses python 3, so it's >> not worth adding support that will break. But if users actually have >> python 3, maybe it's worth it. > I think that on this case the problem was not people who actually have > python 3, but people who see Python as a requirement for building LLVM > and go to python.org and download the "most recent" version, i.e. python > 3, because they are unaware of the incompatibilities. Believe it or not, > there are developers who don't know about the Python mess :-) > > If adding support for version 3 is problematic, a check that gives a > helpful message would be a good start. If it can't be implemented on the > python scripts, it could be implemented on the cmake/configure scripts. > > BTW, http://llvm.org/docs/GettingStarted.html mentions Python as a > requirement for the automated test suite (not for the build.) Says > version >=2.4. A user reading that would assume that version 3.X is ok, > or no Python at all if he only wishes to play with LLVM. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From echristo at gmail.com Mon Jul 22 14:21:05 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 22 Jul 2013 14:21:05 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: On Mon, Jul 22, 2013 at 1:26 PM, Eric Christopher wrote: >>> Have the backend vend diagnostics, this can be done either with a set >>> of enums and messages like you mentioned, or just have a message and >>> location struct ala: >>> >>> struct Msg { >>> const char *Message; >>> Location Loc; >>> }; >>> >>> that the consumer of the message can use via a handler. >> >> When the consumer is clang, it's important that we have diagnostic groups to control the warnings, so the enum is important. We don't want to be comparing the message strings to decide whether a particular warning is an instance of -Wstack-size (for example). >> > > I think, in this case, that I'd want the registration function to > handle the "give me these sorts of warnings" rather than it be a part > of the message. I.e. you'd register for each class of warnings out of > the backend maybe? > >> >> This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it? >> > > Pretty close yeah. > Another thought and alternate strategy for dealing with these sorts of things: A much more broad set of callback machinery that allows the backend to communicate values or other information back to the front end that can then decide what to do. We can define an interface around this, but instead of having the backend vending diagnostics we have the callback take a "do something with this value" which can just be "communicate it back to the front end" or a diagnostic callback can be passed down from the front end, etc. This will probably take a bit more design to get a general framework set up, but imagine the usefulness of say being able to automatically reschedule a jitted function to a thread with a larger default stack size if the callback states that the thread size was N+1 where N is the size of the stack for a thread you've created. Thoughts? -eric From echristo at gmail.com Mon Jul 22 14:24:34 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 22 Jul 2013 14:24:34 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <530329828.13482242.1374520553113.JavaMail.root@alcf.anl.gov> References: <530329828.13482242.1374520553113.JavaMail.root@alcf.anl.gov> Message-ID: >> I think this scheme could also work as a way of dealing with the >> "Optimization Diary" sort of use that Hal is envisioning as well. > > I like "Optimization Diary" :) > > Obviously, we sometimes loose debug information on variables while optimizing, and so the trick is to make it degrade nicely. Nevertheless, I think that we can often be more expressive than just using a single location for everything. How about something like this: > > - The message string is text but a single kind of markup is allowed: , for example: > "We cannot vectorize because is an unfriendly variable" > (where the first will be replaced by text derived from a DIScope and the second from a DIVariable). > > - The structure is this: > struct Msg { > const char *Message; > Function *F; // If nothing else, we can extract a useful name from here, hopefully. > SmallVector DIs; // Should be DIDescriptor* ? > }; > > Then, in the backend, we can look for a DbgValueInst associated with the variable that we want (similar for scopes), and push a DIScope, DIVariable, etc. onto the array, one for every in the message string. The frontend can then render these references in whatever way seems most appropriate (which may include things like making them into hyperlinks, doing some kind of source-code highlighting, etc.). > Yep, that's what I was going for. As a note you don't need to worry about the Function and DIs as a DebugLoc has a scope and can be traveled back to find a function that it's inside or any other enclosing scope. Though I've just posted another message to the thread with a design for a more informational set of callbacks that could be used by individual passes. I don't have a good design off the top of my head, but... -eric > Thanks again, > Hal > >> >> Keeping the separation of concerns around where the front end handles >> diagnostics on what we'll call "source locations" is pretty >> important, >> however, I agree that not every warning can be expressed this way, >> i.e. the stack size diagnostic. However, leaving the printing up to >> the front end is the best way to deal with this and a generic >> diagnostic engine would probably help for things like llc/opt where >> the backend just deals with its input language - IR. >> >> The existing inline asm diagnostics are ... problematic and it would >> definitely be nice to get a generic interface for them. Though >> they're >> actually separated into two separate cases where, I think, we end up >> with our confusion: >> >> a) Front end diagnostics - This is an area that needs some work to be >> decent, but it involves the front end querying the back end for >> things >> like register size, valid immediates, etc and should be implemented >> by >> the front end with an expanded set of target queries. We could use >> this as a way to solidify the backend api for the MS inline asm >> support as well and use some of that when parsing GNU style inline >> asm. >> >> b) Back end diagnostics - This is the stuff that the front end has no >> hope of diagnosing. i.e. "ran out of registers", or "can't figure out >> how to split this up into this kind of vector register". The latter >> has always been a bit funny and I'm always unhappy with it, but I >> don't have any better ideas. A unified scheme of communicating "help >> help I'm being oppressed by the front end" in the backend would be, >> at >> the very least, a step forward. >> >> Thoughts? >> >> -eric >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory From chandlerc at google.com Mon Jul 22 14:25:02 2013 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 22 Jul 2013 14:25:02 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher wrote: > >> This is pretty much the same as what Quentin proposed (with the > addition of the enum), isn't it? > >> > > > > Pretty close yeah. > > > > Another thought and alternate strategy for dealing with these sorts of > things: > > A much more broad set of callback machinery that allows the backend to > communicate values or other information back to the front end that can > then decide what to do. We can define an interface around this, but > instead of having the backend vending diagnostics we have the callback > take a "do something with this value" which can just be "communicate > it back to the front end" or a diagnostic callback can be passed down > from the front end, etc. > > This will probably take a bit more design to get a general framework > set up, but imagine the usefulness of say being able to automatically > reschedule a jitted function to a thread with a larger default stack > size if the callback states that the thread size was N+1 where N is > the size of the stack for a thread you've created. FWIW, *this* is what I was trying to get across. Not that it wouldn't be a callback-based mechanism, but that it should be a fully general mechanism rather than having something to do with warnings, errors, notes, etc. If a frontend chooses to use it to produce such diagnostics, cool, but there are other use cases that the same machinery should serve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jevinsweval at gmail.com Mon Jul 22 14:29:12 2013 From: jevinsweval at gmail.com (Jevin Sweval) Date: Mon, 22 Jul 2013 14:29:12 -0700 Subject: [LLVMdev] Advice to API inconsistency between different versions In-Reply-To: References: Message-ID: On Mon, Jul 22, 2013 at 2:21 AM, lyh.kernel wrote: > Hello all, > > LLVM's API varies a lot from version to version. Take a example, header > llvm/Target/TargetData.h changed to llvm/DataLayout.h from LLVM version 3.1 > to version 3.2. This sliced the program just like: > > #if defined(LLVM_V31) > #include llvm/Target/TargetData.h > #elif defined(LLVM_V32) > #include llvm/DataLayout.h > #else > #error NEED HEADER > > The code is in a mess if I want to support previous LLVM version. I am > wondering how do you support different LLVM versions and keep the code clean > as well? > > On the other hand, consider the example above. Do you usually check for LLVM > version (ex. LLVM_V31, LLVM_V32) or check for feature instead, which use m4 > AC_CHECK_HEADER to detect whether the header exist during configuration? > > Thanks a lot Greetings, My company investigated using a LLVM shim/wrapper to paper over API differences between versions. We eventually concluded that our development time was better spent staying on ToT and taking advantage of IR's (so far) forward compatibility to consume IR generated by older versions. Luckily, we don't have to worry about our output IR being consumed by anything but our ToT llc. Cheers, Jevin From eli.friedman at gmail.com Mon Jul 22 14:57:09 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 22 Jul 2013 14:57:09 -0700 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: References: Message-ID: On Mon, Jul 22, 2013 at 10:19 AM, Stephen Lin wrote: > Hi, > > I noticed that ConstantFP::get automatically returns the appropriately > types Constant depending on the LLVM type passed in (i.e. if called > with a vector, it returns a splat vector with the given constant). > > Is there any simple way to do the inverse of this function? i.e., > given a llvm::Value, check whether it is either a scalar of the given > constant value or a splat vector with the given constant value? I > can't seem to find any, and it doesn't look like the pattern matching > interface provides something similar to this either. > > If this doesn't exist, then I propose adding static versions of all > the various ConstantFoo::isBar() functions, which take a value as a > parameter and check that the value is of a constant of the appropriate > type and value (checking for vectors matching the predicate in the > vector case). > > For example: > > static bool ConstantFP::isExactlyValue(Value *V, double D); > > would return true is V is ConstantFP, a splat ConstantVector, or a > ConstantDataVector with the appropriate type. Similarly, > > static bool ConstantFP::isZero(Value *V); > > would return true if V is a ConstantFP with zero of either sign, a > ConstantVector or ConstantDataVector with all zeros of either sign, or > a zero initializer... > > Anyone have any thoughts, and/or can point me to somewhere where this > kind of thing is already implemented? We do already have Constant::isZeroValue(). There's also m_SpecificFP and m_AnyZero in PatternMatch.h. -Eli From swlin at post.harvard.edu Mon Jul 22 15:03:26 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Mon, 22 Jul 2013 15:03:26 -0700 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: References: Message-ID: > We do already have Constant::isZeroValue(). There's also m_SpecificFP > and m_AnyZero in PatternMatch.h. > > -Eli Oh, the latter two are what I need, more or less. Thanks! Stephen From qcolombet at apple.com Mon Jul 22 16:17:27 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Mon, 22 Jul 2013 16:17:27 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: Hi, Compared to my previous email, I have added Hal’s idea for formatting the message and pull back some idea from the "querying framework”. Indeed, I propose to add some information in the reporting so that a front-end (more generally a client) can filter the diagnostics or take proper actions. See the details hereafter. On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: > > On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher wrote: > >> This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it? > >> > > > > Pretty close yeah. > > > > Another thought and alternate strategy for dealing with these sorts of things: > > A much more broad set of callback machinery that allows the backend to > communicate values or other information back to the front end that can > then decide what to do. We can define an interface around this, but > instead of having the backend vending diagnostics we have the callback > take a "do something with this value" which can just be "communicate > it back to the front end" or a diagnostic callback can be passed down > from the front end, etc. > > This will probably take a bit more design to get a general framework > set up, but imagine the usefulness of say being able to automatically > reschedule a jitted function to a thread with a larger default stack > size if the callback states that the thread size was N+1 where N is > the size of the stack for a thread you've created. > > FWIW, *this* is what I was trying to get across. Not that it wouldn't be a callback-based mechanism, but that it should be a fully general mechanism rather than having something to do with warnings, errors, notes, etc. If a frontend chooses to use it to produce such diagnostics, cool, but there are other use cases that the same machinery should serve. I like the general idea. To be sure I understood the proposal, let me give an example. ** Example ** The compiler says here is the size of the stack for Loc via a “handler” (“handler" in the sense whatever mechanism we come up to make such communication possible). Then the front-end builds the diagnostic from that information (or query for more if needed) or drops everything if it does not care about this size for instance (either it does not care at all or the size is small enough compared to its setting). ** Comments ** Unless we have one handler per -kind of - use, and I would like to avoid that, I think we should still provide an information on the severity of the thing we are reporting and what we are reporting. Basically: - Severity: Will the back-end abort after the information pass down or will it continue (the boolean of the previous proposal)? - Kind: What are we reporting (the enum from the previous proposal)? I also think we should be able to provide a default (formatted) message, such that a client that does not need to know what to do with the information can still print something somehow useful, especially on abort cases. Thus, it sounds a good thing to me to have a string with some markers to format the output plus the arguments to be used in the formatted output. Hal’s proposal could do the trick (although I do not know if DIDescriptor are the best thing to use here). ** Summary ** I am starting to think that we should be able to cover the reporting case plus some querying mechanism with something like: void reportSomehtingToFEHandler(enum Reporting Kind, bool IsAbort, , const char* DefautMsg, ) Where is supposed to be the class/struct/pointer to the relevant information for this kind. If it is not enough the FE should call additional APIs to get what it wants. This looks similar to the “classical” back-end report to front-end approach, but gives more freedom to the front-end as it can choose what to do based on the attached information. I also believe this will reduce the need to expose back-end APIs and speed up the process. However, the ability of the front-end (or client) to query the back-end is limited to the places where the back-end is reporting something. Also, if the back-end is meant to abort, the FE cannot do anything about it (e.g., the stack size is not big enough for the jitted function). That’s why I said it cover “some" querying mechanism. ** Concerns ** 1. Testing. Assuming we will always emit these reports, relying on a front-end to filter out what is not currently relevant (e.g., we did not set the stack size warning in the FE), what will happen when we test (make check) without a front-end? I am afraid we will pollute all tests or we will have some difficulty to test a specific reporting. 2. Regarding a completely query based approach, like Chris pointed out, I do not see how we can report consistent information at any given time. Also, Eric, coming back to your jit example, how could we prevent the back-end to abort if the jitted is too big for the stack? 3. Back to the strictly reporting approach where we extend the inlineasm handler (the approach proposed by Bob and that I sketched a little bit more), now looks similar to this approach expect that the back-end chooses what it is relevant to report and the back-end does not need to pass down the information. The concern is how do we easily (in a robust and extendable manner) provide a front-end/back-end option for each warning/error? Thoughts? Cheers, -Quentin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From nlewycky at google.com Mon Jul 22 16:28:43 2013 From: nlewycky at google.com (Nick Lewycky) Date: Mon, 22 Jul 2013 16:28:43 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> Message-ID: On 22 July 2013 01:11, Kuperstein, Michael M wrote: > Of course frontends are free to put attributes, but it would be nice if > optimizations actually used them. ;-) > My use case is that of proprietary frontend that happens to know some > library function calls - which are only resolved at link time - have no > side effects and are safe to execute speculatively, and wants to tell the > optimizer it can move them around however it likes. I'll gladly submit a > patch that uses these hints, but I'd like to reach some consensus on what > the desired attributes actually are first. The last thing I want is to add > attributes that are only useful to myself. > > Regarding having several orthogonal attributes vs. things like > "safetospeculate": > > To know a function is safe to speculatively execute, I need at least: > 1) readnone (readonly is insufficient, unless I know all accessed pointers > are valid) > 2) nounwind > 3) nolongjmp (I guess?) > 4) no undefined behavior. This includes things like "halting" and "no > division by zero", but that's not, by far, an exhaustive list. > > I guess there are several ways to handle (4). > Ideally, I agree with you, we'd like a set of orthogonal attributes that, > taken together, imply that the function's behavior is not undefined. > But that requires mapping all sources of undefined behavior (I don't think > this is currently documented for LLVM IR, at least not in a centralized > fashion) and adding a very specific attribute for each of them. I'm not > sure having function declarations with "readnone, nounwind, nolongjmp, > halting, nodivbyzero, nopoisonval, nocomparelabels, nounreachable, ..." is > desirable. > > We could also have a "welldefined" attribute and a "halting" attribute > where "welldefined" subsumes "halting", if the specific case of a function > which halts but may have undefined behavior is important. > While the two are not orthogonal, it's similar to the situation with > "readnone" and "readonly". Does that sound reasonable? > You're entirely right. I forgot about undefined behaviour. If you want a 'speculatable' attribute, I would review that patch. Please audit the intrinsics (at least the target-independent ones) and appropriate library functions for whether you can apply this attribute to them. I think the only optimization that it can trigger is that "isSafeToSpeculativelyExecute" returns true on it. Anything else? Is it safe to infer readnone and nounwind from speculatable? I should mention that speculatable functions are extraordinarily limited in what they can do in the general (non–LLVM-as-a-library) case. They may be hoisted above calls to fork or pthread_create, they may be moved into global constructors (and thus can't depend on global state), etc. However, since you have a specific library you want to generate code against, you have the power to make use of it. I don't expect clang or dragonegg to be able to make use of it. Nick > -----Original Message----- > From: Nick Lewycky [mailto:nicholas at mxc.ca] > Sent: Monday, July 22, 2013 10:24 > To: Kuperstein, Michael M > Cc: Andrew Trick; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Kuperstein, Michael M wrote: > > I'm not sure I understand why it's blocked on that, by the way. > > It blocks our ability to automatically deduce the halting attribute in the > optimizer, which was necessary for the use case I had at the time. > If you have a use case of your own, feel free to propose the patch! > > (Technically it's not *blocked* -- see how my patch does it! -- but the > workarounds are too horrible to be committed.) > > > Even if we can't apply the attribute ourselves, I don't see why we > wouldn't expose that ability to frontends. > > Frontends are free to put attributes on functions if they want to. Go for > it! > > > I'm not entirely sure "halting" is the right attribute either, by the > way. > > What I, personally, would like to see is a way to specify a function > call is safe to speculatively execute. That implies readnone (not just > readonly), nounwind, halting - and Eris knows what else. Nick, is that too > strong for you? > > I strongly prefer the approach of having orthogonal attributes. There are > optimizations that you can do with each of these attributes on their own. > In particular I think that readonly+halting+nounwind+nolongjmp is going to > be common and I'd feel silly if we had a special case for > readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more > common case. > > That said, I'm also going to feel silly if we don't end up with enough > attributes to allow isSafeToSpeculate to deduce it, which is where we > are right now. I was planning to get back to fixing this after > Chandler's promised PassManager work. > > Nick > > > > > Michael > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Nick Lewycky > > Sent: Monday, July 22, 2013 07:08 > > To: Andrew Trick > > Cc: llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] Does nounwind have semantics? > > > > Andrew Trick wrote: > >> Does 'nounwind' have semantics that inform optimization passes? It > seems to in some cases, but not consistently. For example... > >> > >> int32_t foo(int32_t* ptr) { > >> int i = 0; > >> int result; > >> do { > >> bar(ptr); > >> result = *ptr; > >> bar(ptr); > >> } while (i++< *ptr); > >> return result; > >> } > >> > >> Say we have a front end that declares bar as... > >> > >> declare void @bar(i32*) readonly; > >> > >> So 'bar' is 'readonly' and 'may-unwind'. > >> > >> When LICM tries to hoist the load it interprets the 'may-unwind' as > "MayThrow" in LICM-language and bails. However, when it tries to sink the > call itself it sees the 'readonly', assumes no side effects and sinks it > below the loads. Hmm... > >> > >> There doesn't appear to be a way to declare a function that is > guaranteed not to write to memory in a way that affects the caller, but may > have another well-defined side effect like aborting the program. This is > interesting, because that is the way runtime checks for safe languages > would like to be defined. I'm perfectly happy telling front ends to > generate control flow for well-defined traps, since I like lots of basic > blocks in my IR. But I'm still curious how others deal with this. > > > > Yes, we went through a phase where people would try to use > "nounwind+readonly == no side-effects" to optimize. All such optimizations > are wrong. Unless otherwise proven, a function may inf-loop, terminate the > program, or longjmp. > > > > I tried to add 'halting' to help solve part of this a long time ago, but > it never went in. The problem is that determining whether you have loops > requires a FunctionPass (LoopInfo to find loops and SCEV to determine an > upper bound) and applying function attributes is an SCC operation (indeed, > an SCC is itself a loop), so it's all blocked behind fixing the PassManager > to allow CGSGGPasses to depend on FunctionPasses. > > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > > > I'm now in a similar situation where I want 'nounwind' to mean "only > exits by terminating the program or a return instruction" but unfortunately > functions which longjmp are considered nounwind. I would like to change > llvm to make longjmp'ing a form of unwinding (an exceptional exit to the > function), but if I were to apply that rule today then we'd start putting > dwarf eh tables on all our C code, oops. > > > > Nick > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Mon Jul 22 19:12:04 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 22 Jul 2013 19:12:04 -0700 Subject: [LLVMdev] Questions about MachineScheduler In-Reply-To: <20130722185053.GB1714@L7-CNU1252LKR-172027226155.amd.com> References: <20130722185053.GB1714@L7-CNU1252LKR-172027226155.amd.com> Message-ID: <2BB94AA8-217F-4768-AE85-0756349F8337@apple.com> On Jul 22, 2013, at 11:50 AM, Tom Stellard wrote: > Hi, > > I'm working on defining a SchedMachineModel for the Southern Islands > family of GPUs, and I have two questions related to the > MachineScheduler. > > 1. I have a resource that can process 15 instructions at the same time. > In the TableGen definitions, should I do: > > def HWVMEM : ProcResource<15>; > or > > let BufferSize = 15 in { > def HWVMEM : ProcResource<1>; > } For in-order processors you always want BufferSize=0. In the current generic scheduler (ConvergingScheduler) it's effectively a boolean that specifies inorder vs OOO. (I have code that models the buffers in an OOO processor, but I think it’s too heavy-weight to go in the scheduler. Maybe someday it can be an analysis tool.) let BufferSize = 0 { def HWVMEM : ProcResource<15>; } Now since you’ll want to plugin your own scheduling strategy, how you interpret the machine model is mostly up to you. What the TargetSchedModel interface does for you is normalize the resources to processor cycles. This is exposed with scaling factors (to avoid division): getResourceFactor, getMicroOpFactor, getLatencyFactor. So if you have def HW1 : ProcResource<15>; def HW2 : ProcResource<3>; LatencyFactor=15 ResourceFactor(HW1)=1 ResourceFactor(HW2)=5 > 2. Southern Islands has 256 registers, but there is a significant > performance penalty if you use more than a certain amount. Do any of > the MachineSchedulers support switching into an 'optimize for register > pressure mode' once it detects register pressure above a certain limit? The code in ConvergingScheduler (I’ll rename it to GenericScheduler soon) is meant to demonstrate most of the features so developers can copy what they need into their own strategy, add heuristics and change the underlying data structures, which often makes sense. You can decide whether you want only bottom-up, top-down, or both. For an in-order processor, I think this becomes much simpler. You do away with most of the complexity in ConvergingScheduler::SchedBoundary and implement a straightforward reservation table. If it’s fully pipelined then you just count resource units for the current cycle until one reaches the latency factor. If it’s not fully pipelined, then you need to define ResourceCycles in the machine’s SchedWrite definitions and implement a simple reservation table (mark earliest cycle at which a resource is used for bottom-up scheduling). Some of this can be made a generic utility, but it’s not much to implement. Since the strategy defines the priority queues, you can do whatever you want for your register pressure heuristics. From scanning the full queue each time with dynamic heuristics, to resorting, to dynamically deferring nodes... Note that the register pressure tracking is handled outside of the strategy, in ScheduleDAGMI. So you get this for free without duplication. However, querying pressure change for a candidate is done by the strategy. The generic interface, getMaxPressureDelta(), is very clunky now. I’m going to improve it, but If you’re writing a target specific strategy, it’s probably easier to directly query a pressure set for a specific register class. e.g. RC =TRI->getRegClass(R) *PSetID = TRI->getRegClassPressureSets(RC) Now you have a raw pointer right into the machine model’s null terminated array of PSetIDs that are affected by a register class (targets often have several overlapping register classes). You can choose one of those sets to track or track them all. I’m about to commit a patch that will have them sorted by number of regs in the set, so you can easily grab the largest (end of the list). Then you can directly query pressure for a specific set... P = RPTracker.getPressureAfterInst(I) diff = P[PsetID] - RPTracker.getRegSetPressureAtPos()[PSetID] Note that how you define your target’s registers can make a big difference in the pressure set formation. Yours don’t look to bad, but in general remember to use isAllocatable=0 for any classes that don’t take part in regalloc. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at stellard.net Mon Jul 22 19:18:18 2013 From: tom at stellard.net (Tom Stellard) Date: Mon, 22 Jul 2013 19:18:18 -0700 Subject: [LLVMdev] Questions about MachineScheduler In-Reply-To: <2BB94AA8-217F-4768-AE85-0756349F8337@apple.com> References: <20130722185053.GB1714@L7-CNU1252LKR-172027226155.amd.com> <2BB94AA8-217F-4768-AE85-0756349F8337@apple.com> Message-ID: <20130723021818.GC1714@L7-CNU1252LKR-172027226155.amd.com> On Mon, Jul 22, 2013 at 07:12:04PM -0700, Andrew Trick wrote: > > On Jul 22, 2013, at 11:50 AM, Tom Stellard wrote: > > > Hi, > > > > I'm working on defining a SchedMachineModel for the Southern Islands > > family of GPUs, and I have two questions related to the > > MachineScheduler. > > > > 1. I have a resource that can process 15 instructions at the same time. > > In the TableGen definitions, should I do: > > > > def HWVMEM : ProcResource<15>; > > or > > > > let BufferSize = 15 in { > > def HWVMEM : ProcResource<1>; > > } > > For in-order processors you always want BufferSize=0. In the current generic scheduler (ConvergingScheduler) it's effectively a boolean that specifies inorder vs OOO. (I have code that models the buffers in an OOO processor, but I think it’s too heavy-weight to go in the scheduler. Maybe someday it can be an analysis tool.) > > let BufferSize = 0 { > def HWVMEM : ProcResource<15>; > } > > Now since you’ll want to plugin your own scheduling strategy, how you interpret the machine model is mostly up to you. What the TargetSchedModel interface does for you is normalize the resources to processor cycles. This is exposed with scaling factors (to avoid division): getResourceFactor, getMicroOpFactor, getLatencyFactor. > > So if you have > def HW1 : ProcResource<15>; > def HW2 : ProcResource<3>; > > LatencyFactor=15 > ResourceFactor(HW1)=1 > ResourceFactor(HW2)=5 > > > 2. Southern Islands has 256 registers, but there is a significant > > performance penalty if you use more than a certain amount. Do any of > > the MachineSchedulers support switching into an 'optimize for register > > pressure mode' once it detects register pressure above a certain limit? > > > The code in ConvergingScheduler (I’ll rename it to GenericScheduler soon) is meant to demonstrate most of the features so developers can copy what they need into their own strategy, add heuristics and change the underlying data structures, which often makes sense. You can decide whether you want only bottom-up, top-down, or both. > > For an in-order processor, I think this becomes much simpler. You do away with most of the complexity in ConvergingScheduler::SchedBoundary and implement a straightforward reservation table. If it’s fully pipelined then you just count resource units for the current cycle until one reaches the latency factor. If it’s not fully pipelined, then you need to define ResourceCycles in the machine’s SchedWrite definitions and implement a simple reservation table (mark earliest cycle at which a resource is used for bottom-up scheduling). Some of this can be made a generic utility, but it’s not much to implement. > > Since the strategy defines the priority queues, you can do whatever you want for your register pressure heuristics. From scanning the full queue each time with dynamic heuristics, to resorting, to dynamically deferring nodes... > > Note that the register pressure tracking is handled outside of the strategy, in ScheduleDAGMI. So you get this for free without duplication. > > However, querying pressure change for a candidate is done by the strategy. The generic interface, getMaxPressureDelta(), is very clunky now. I’m going to improve it, but If you’re writing a target specific strategy, it’s probably easier to directly query a pressure set for a specific register class. > > e.g. > RC =TRI->getRegClass(R) > *PSetID = TRI->getRegClassPressureSets(RC) > > Now you have a raw pointer right into the machine model’s null terminated array of PSetIDs that are affected by a register class (targets often have several overlapping register classes). You can choose one of those sets to track or track them all. I’m about to commit a patch that will have them sorted by number of regs in the set, so you can easily grab the largest (end of the list). > > Then you can directly query pressure for a specific set... > > P = RPTracker.getPressureAfterInst(I) > diff = P[PsetID] - RPTracker.getRegSetPressureAtPos()[PSetID] > > Note that how you define your target’s registers can make a big difference in the pressure set formation. Yours don’t look to bad, but in general remember to use isAllocatable=0 for any classes that don’t take part in regalloc. > Hi Andy, Thanks for the response, this is really helpful. -Tom From g.franceschetti at vidya.it Mon Jul 22 22:22:39 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti) Date: Tue, 23 Jul 2013 07:22:39 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <51ED9B35.4010304@vidya.it> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> Message-ID: <51EE131F.70008@vidya.it> I also tried to build LLVM with 3.3 sources. Same problems. Even worse, Visual Studio hangs and I had to kill the process. What could it be? Is Visual Studio 2012 working with LLVM/clang? Or LLVM/Clang is not supposed to work on windows (I saw also that there are no binaries ready for the windows platform). Thanks in advance, Giorgio Il 22/07/2013 22.51, Giorgio Franceschetti ha scritto: > Hi all, > yes, I do not know python and I installed it only for being able to > build LLVM. > Now I have installed version 2.7. > > I tried with codeblock project generation, but I'm still getting errors. > > So I moved to visual studio as per "getting started" guide. > > I run the command: cmake -G "Visual Studio 11" ..\llvm from my build > folder. > > It lists a lot of file not found during the execution, but at the end > it does create th visual studio projects. > Based on the web guide, it should be successful. > First question, is it really? > > Then, I open visual studio and run the solution compilation. > > But, after a long time, I got a lot of errors stating that it is not > possible to find the stdbool.h file + a few others. > Example: > error C1083: Impossibile aprire il file inclusione 'stdbool.h': No > such file or directory ( path>\llvm\projects\compiler-rt\lib\floatuntisf.c) path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 > error C2061: errore di sintassi: identificatore '__attribute__' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 > error C2059: errore di sintassi: ';' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 > error C2182: 'noreturn': utilizzo non valido del tipo 'void' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 clang_rt.x86_64 > error C1083: Impossibile aprire il file inclusione 'stdbool.h': No > such file or directory ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 > > > What could it be? > > Any help is appreciated, > > Giorgio > > Il 22/07/2013 03.38, Óscar Fuentes ha scritto: >> Reid Kleckner writes: >> >>> My initial impression was that still probably nobody uses python 3, >>> so it's >>> not worth adding support that will break. But if users actually have >>> python 3, maybe it's worth it. >> I think that on this case the problem was not people who actually have >> python 3, but people who see Python as a requirement for building LLVM >> and go to python.org and download the "most recent" version, i.e. python >> 3, because they are unaware of the incompatibilities. Believe it or not, >> there are developers who don't know about the Python mess :-) >> >> If adding support for version 3 is problematic, a check that gives a >> helpful message would be a good start. If it can't be implemented on the >> python scripts, it could be implemented on the cmake/configure scripts. >> >> BTW, http://llvm.org/docs/GettingStarted.html mentions Python as a >> requirement for the automated test suite (not for the build.) Says >> version >=2.4. A user reading that would assume that version 3.X is ok, >> or no Python at all if he only wishes to play with LLVM. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From kumarsukhani at gmail.com Mon Jul 22 22:27:47 2013 From: kumarsukhani at gmail.com (Kumar Sukhani) Date: Tue, 23 Jul 2013 10:57:47 +0530 Subject: [LLVMdev] Compiling "vmkit" on Ubuntu_x64 - Error: missing argument to --bindir In-Reply-To: References: <51E93D97.10103@gmail.com> <51E94DD2.4060802@gmail.com> <51ED3CB8.8030001@gmail.com> Message-ID: Thanks for your fast responses, this group seams very much active. On Mon, Jul 22, 2013 at 8:45 PM, Gaël Thomas wrote: > > . If you plan to make the port, feel free to send us a lot of patches :) > > If I get time to work on it, I will definitely do that. > Gaël > Le 22 juil. 2013 17:09, "Kumar Sukhani" a écrit : > > here its mentioned that its portable on ARM. So >> simply cross-compiling will work? >> >> On Mon, Jul 22, 2013 at 7:37 PM, Harris BAKIRAS wrote: >> >>> Hello Kumar, >>> >>> Unfortunately we never experienced on ARM architecture and we are not >>> planning to port VMKit on ARM for the moment. >>> >>> Regards, >>> >>> Harris Bakiras >>> >>> On 07/19/2013 05:50 PM, Kumar Sukhani wrote: >>> >>> I am working on a project to port JRuby on Embedded systems. JRuby >>> converts Ruby code to bytecode which is executed by any JVM. For this >>> project I am testing performance of JRuby with various available JVMs. I >>> have chosen ARM architecture. >>> Does vmkit support ARM architecture? >>> >>> On Fri, Jul 19, 2013 at 8:01 PM, Harris BAKIRAS wrote: >>> >>>> I don't know how JRuby works, maybe it uses some new feature that GNU >>>> Classpath does not provide. >>>> >>>> VMKit's openJDK version is unstable on 64 bits since package version >>>> 6b27. >>>> You can still use it for very small programs which does not need GC but >>>> that's all. >>>> >>>> It works fine on 32 bits. >>>> So you can try it on 32 bits or revert your java version to a previous >>>> one (< than 6b27) to test it on 64 bits. >>>> >>>> We are working on fixing the 64 bits issue as soon as possible. >>>> >>>> Harris Bakiras >>>> >>>> On 07/19/2013 03:47 PM, Kumar Sukhani wrote: >>>> >>>> Hi Harris Bakiras, >>>> Thanks for reply. It working now. >>>> Actually I wanted to try vmkit VM to run jruby codes. >>>> >>>> vmkit is able to run Java program, but when I try to run JRuby code >>>> then I get following error - >>>> >>>> root at komal:/home/komal/Desktop/GSOC/programs# jruby hello.rb >>>>> >>>>> Platform.java:39:in `getPackageName': java.lang.NullPointerException >>>>> >>>>> from ConstantSet.java:84:in `getEnumClass' >>>>> >>>>> from ConstantSet.java:60:in `getConstantSet' >>>>> >>>>> from ConstantResolver.java:181:in `getConstants' >>>>> >>>>> from ConstantResolver.java:102:in `getConstant' >>>>> >>>>> from ConstantResolver.java:146:in `intValue' >>>>> >>>>> from OpenFlags.java:28:in `value' >>>>> >>>>> from RubyFile.java:254:in `createFileClass' >>>>> >>>>> from Ruby.java:1273:in `initCore' >>>>> >>>>> from Ruby.java:1101:in `bootstrap' >>>>> >>>>> from Ruby.java:1079:in `init' >>>>> >>>>> from Ruby.java:179:in `newInstance' >>>>> >>>>> from Main.java:217:in `run' >>>>> >>>>> from Main.java:128:in `run' >>>>> >>>>> from Main.java:97:in `main' >>>>> >>>>> >>>> Can you tell me what will be the issue ? >>>> Vmkit doesn't work with OpenJDK ? >>>> >>>> On Fri, Jul 19, 2013 at 6:52 PM, Harris BAKIRAS wrote: >>>> >>>>> Hi Kumar, >>>>> >>>>> There is an error on your configuration line, you should provide the >>>>> path to llvm-config binary instead of configure file. >>>>> Assuming that you compiled llvm in release mode, the llvm-config >>>>> binary is located in : >>>>> >>>>> YOUR_PATH_TO_LLVM/Release+Asserts/bin/llvm-config >>>>> >>>>> Try to change the -with-llvm-config-path option and it will compile. >>>>> >>>>> Harris Bakiras >>>>> >>>>> On 07/19/2013 02:36 PM, Kumar Sukhani wrote: >>>>> >>>>> To compile vmkit on Ubuntu 12.04 64-bit machine, I followed the >>>>> steps giving here [1]. >>>>> but when I run ./configure I am getting following error- >>>>> >>>>> root at komal:/home/komal/Desktop/GSOC/vmkit/vmkit# ./configure >>>>>>> -with-llvm-config-path=../llvm-3.3.src/configure >>>>>>> --with-gnu-classpath-glibj=/usr/local/classpath/share/classpath/glibj.zip >>>>>>> --with-gnu-classpath-libs=/usr/local/classpath/lib/classpath >>>>>> >>>>>> checking build system type... x86_64-unknown-linux-gnu >>>>>> >>>>>> checking host system type... x86_64-unknown-linux-gnu >>>>>> >>>>>> checking target system type... x86_64-unknown-linux-gnu >>>>>> >>>>>> checking type of operating system we're going to host on... Linux >>>>>> >>>>>> configure: error: missing argument to --bindir >>>>>> >>>>>> configure: error: Cannot find (or not executable) >>>>>> >>>>>> >>>>> I tried searching it online but didn't got any similar issue. >>>>> >>>>> [1] http://vmkit.llvm.org/get_started.html >>>>> >>>>> -- >>>>> Kumar Sukhani >>>>> +919579650250 >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>> >>>> >>>> -- >>>> Kumar Sukhani >>>> +919579650250 >>>> >>>> >>>> >>> >>> >>> -- >>> Kumar Sukhani >>> +919579650250 >>> >>> >>> >> >> >> -- >> Kumar Sukhani >> +919579650250 >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> -- Kumar Sukhani +919579650250 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kai.nacke at redstar.de Mon Jul 22 22:44:25 2013 From: kai.nacke at redstar.de (Kai Nacke) Date: Tue, 23 Jul 2013 07:44:25 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <51ED9B35.4010304@vidya.it> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> Message-ID: <51EE1839.6040204@redstar.de> Hi Giorgio, here is another description how to compile LLVM on Windows: http://wiki.dlang.org/Building_and_hacking_LDC_on_Windows_using_MSVC Maybe this is helpful. I created this for Windows 7 but I also repeated it successfully on Windows 8. Regards Kai On 22.07.2013 22:51, Giorgio Franceschetti wrote: > Hi all, > yes, I do not know python and I installed it only for being able to > build LLVM. > Now I have installed version 2.7. > > I tried with codeblock project generation, but I'm still getting errors. > > So I moved to visual studio as per "getting started" guide. > > I run the command: cmake -G "Visual Studio 11" ..\llvm from my build > folder. > > It lists a lot of file not found during the execution, but at the end it > does create th visual studio projects. > Based on the web guide, it should be successful. > First question, is it really? > > Then, I open visual studio and run the solution compilation. > > But, after a long time, I got a lot of errors stating that it is not > possible to find the stdbool.h file + a few others. > Example: > error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such > file or directory ( path>\llvm\projects\compiler-rt\lib\floatuntisf.c) path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 > error C2061: errore di sintassi: identificatore '__attribute__' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 > clang_rt.x86_64 > error C2059: errore di sintassi: ';' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 > clang_rt.x86_64 > error C2182: 'noreturn': utilizzo non valido del tipo 'void' ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 > clang_rt.x86_64 > error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such > file or directory ( path>\llvm\projects\compiler-rt\lib\int_util.c) path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 > > > What could it be? > > Any help is appreciated, > > Giorgio > > Il 22/07/2013 03.38, Óscar Fuentes ha scritto: >> Reid Kleckner writes: >> >>> My initial impression was that still probably nobody uses python 3, >>> so it's >>> not worth adding support that will break. But if users actually have >>> python 3, maybe it's worth it. >> I think that on this case the problem was not people who actually have >> python 3, but people who see Python as a requirement for building LLVM >> and go to python.org and download the "most recent" version, i.e. python >> 3, because they are unaware of the incompatibilities. Believe it or not, >> there are developers who don't know about the Python mess :-) >> >> If adding support for version 3 is problematic, a check that gives a >> helpful message would be a good start. If it can't be implemented on the >> python scripts, it could be implemented on the cmake/configure scripts. >> >> BTW, http://llvm.org/docs/GettingStarted.html mentions Python as a >> requirement for the automated test suite (not for the build.) Says >> version >=2.4. A user reading that would assume that version 3.X is ok, >> or no Python at all if he only wishes to play with LLVM. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From tanmx_star at yeah.net Mon Jul 22 23:58:17 2013 From: tanmx_star at yeah.net (Star Tan) Date: Tue, 23 Jul 2013 14:58:17 +0800 (CST) Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51ED4164.9000008@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> <51ED4164.9000008@grosser.es> Message-ID: <3ae8cb58.2cb9.1400a524281.Coremail.tanmx_star@yeah.net> Hi Tobias, I have attached a patch file to optimize string operations in Polly-Detect pass. In this patch file, I put most of long string operations in the condition variable of "PollyViewMode" or in the DEBUG mode. Bests, Star Tan At 2013-07-22 22:27:48,"Tobias Grosser" wrote: >On 07/22/2013 01:46 AM, Star Tan wrote: >> At 2013-07-22 12:16:53,"Tobias Grosser" wrote: >>> I propose two more patches: >>> >>> 1) Transform the INVALID macro into function calls, that format >>> the text and that set LastFailure. >> Translating the INVALID macro into function calls would complicate the operations for different counters. >> For example, currently users can add a new counter by simply adding the following line: >> BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form"); >> But if we translate INVALID macro into function calls, then users has to add a function for this counter: >> void INVLIAD_SimpleLoop (...). \ > > ^^ No uppercase in function names. > >> This is because we cannot use the following macro combination in function calls: >> if (!Context.Verifying) \ >> ++Bad##NAME##ForScop; >> So, I do not think it is necessary to translate the INVALID macro into function calls. >> Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ... ? In that case, I could provide a small patch file for this purpose -:) > >I think it would still be nice to get rid of this macro. We could >probably have a default function that takes an enum to report different >errors in the reportInvalid(enum errorKind) style. And then several >others that would allow more complex formatting (e.g. >reportInvalidAlias(AliasSet)). Especially the code after 'isMustAlias()' >would be nice to move out of the main scop detection. > >However, this issue is not directly related to the speedup work, so >you are welcome to skip it for now. > >(Btw. thanks for not blindly following my suggestions!) > >>> 2) Add checks at the beginning of those function calls and >>> continue only if LogErrors is set >> Those invalid log strings are used for two separate cases: >> 1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement: >> if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE)) >> That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it. >> 2) The other case is for the variable "LastFailure", which is used only in GraphPrinter. Currently string operations for "LastFailure" always execute in normal cases. My idea is to put such string operations under the condition of "GraphPrinter" mode. For example, I would like to translate the "INVALID" macro into: >> #define INVALID(NAME, MESSAGE) \ >> do { \ >> if (GraphViewMode) { \ >> std::string Buf; \ >> raw_string_ostream fmt(Buf); \ >> fmt << MESSAGE; \ >> fmt.flush(); \ >> LastFailure = Buf; \ >> } \ >> DEBUG(dbgs() << MESSAGE); \ >> DEBUG(dbgs() << "\n"); \ >> assert(!Context.Verifying &&#NAME); \ >> if (!Context.Verifying) \ >> ++Bad##NAME##ForScop; \ >> } while (0) > >Looks good. > >> As you have suggested, we can construct the condition GraphViewMode with "-polly-show", "-polly-show-only", "polly-dot" and "polly-dot-only". However, I see all these options are defined as "static" variables in lib/RegisterPasses.cpp. Do you think I should translate these local variables into global variables or should I define another option like "-polly-dot-scop" in ScopDetection.cpp? > >You can define a new option -polly-detect-collect-errors that enables >the error tracking. Adding cl::location to this option allows you to >store the option value externally. You can use this to automatically >set this option, in case in lib/RegisterPasses.cpp in case -polly-show, >-polly-show-only, ... have been set. > >Cheers, >Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-ScopDetect-Optimize-compile-time-cost-for-log-string.patch Type: application/octet-stream Size: 9481 bytes Desc: not available URL: From baembel at gmx.de Tue Jul 23 02:28:10 2013 From: baembel at gmx.de (Boris Boesler) Date: Tue, 23 Jul 2013 11:28:10 +0200 Subject: [LLVMdev] support for addressing units which are not 8 bits In-Reply-To: References: Message-ID: Am 20.06.2013 um 18:51 schrieb Eli Friedman: > On Thu, Jun 20, 2013 at 6:14 AM, Boris Boesler wrote: > Hi! > > I want to write a LLVM back-end for a bit addressing target architecture. ... > And what means "a lot of code in the backend" in the link above? Which other parts are involved? > > The tricky part here isn't addressing, it's that is that each address points at 32 bits, so you have to track down every single place LLVM hardcodes "i8" and fix it. > > CC'ing Philipp Brüschweiler, who had a patch series a while back (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120702/146050.html). Finally I had some time to give it a try. I can't apply the patches to llvm 3.2 because the files include/llvm/Target/TargetData.h and lib/Target/TargetData.cpp don't exist anymore. Probably they have been split into multiple new files. But applying the patches to the new files by hand should not be to hard. Anyway, are there plans to add this kind of feature to the trunk? Isn't there some demand? Boris From conormacaoidh at gmail.com Tue Jul 23 04:03:39 2013 From: conormacaoidh at gmail.com (Conor Mac Aoidh) Date: Tue, 23 Jul 2013 12:03:39 +0100 Subject: [LLVMdev] Vector DAG Patterns Message-ID: <51EE630B.1030002@gmail.com> Hi All, Been having a problem constructing a suitable pattern to represent some vector operations in the DAG. Stuff like andx/orx operations where elements of a vector are anded/ored together. My approach thus far has been to extract the sub elements of the vector and and/or those elements. This is ok for 4 vectors of i32s, but becomes cumbersome for v16i8s. Example instruction: andx $dst $v1 Pattern: [(set RC:$dst, (and (i32 (vector_extract(vt VC:$src), 0 ) ), (and (i32 (vector_extract(vt VC:$src), 1 ) ), (and (i32 (vector_extract(vt VC:$src), 2 ) ), (i32 (vector_extract(vt VC:$src), 3 ) ) ) ) ) )] Is there a better way to do this? Regards --- Conor Mac Aoidh From rkotler at mips.com Tue Jul 23 04:34:11 2013 From: rkotler at mips.com (reed kotler) Date: Tue, 23 Jul 2013 04:34:11 -0700 Subject: [LLVMdev] -Os Message-ID: <51EE6A33.2010203@mips.com> When I use -Os with a clang that implicitly calls llc, I get much different code than when call clang first with -Os and then call llc. How do I get these two paths to generate the same code? Tia. Reed From rafael.espindola at gmail.com Tue Jul 23 05:03:46 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Tue, 23 Jul 2013 08:03:46 -0400 Subject: [LLVMdev] -Os In-Reply-To: <51EE6A33.2010203@mips.com> References: <51EE6A33.2010203@mips.com> Message-ID: check that the same passes are running and check that you are getting the same inline threshold. On 23 July 2013 07:34, reed kotler wrote: > When I use -Os with a clang that implicitly calls llc, I get much different > code than when call clang first with -Os and then call llc. > > How do I get these two paths to generate the same code? > > Tia. > > Reed > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From annulen at yandex.ru Tue Jul 23 05:19:55 2013 From: annulen at yandex.ru (Konstantin Tokarev) Date: Tue, 23 Jul 2013 16:19:55 +0400 Subject: [LLVMdev] -Os In-Reply-To: <51EE6A33.2010203@mips.com> References: <51EE6A33.2010203@mips.com> Message-ID: <3431374581995@web12h.yandex.ru> 23.07.2013, 16:05, "reed kotler" : > When I use -Os with a clang that implicitly calls llc, I get much > different code than when call clang first with -Os and then call llc. clang does NOT call llc internally. > How do I get these two paths to generate the same code? Call llc with -Os? -- Regards, Konstantin From rkotler at mips.com Tue Jul 23 05:22:07 2013 From: rkotler at mips.com (reed kotler) Date: Tue, 23 Jul 2013 05:22:07 -0700 Subject: [LLVMdev] -Os In-Reply-To: <3431374581995@web12h.yandex.ru> References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> Message-ID: <51EE756F.80503@mips.com> On 07/23/2013 05:19 AM, Konstantin Tokarev wrote: > > 23.07.2013, 16:05, "reed kotler" : >> When I use -Os with a clang that implicitly calls llc, I get much >> different code than when call clang first with -Os and then call llc. > clang does NOT call llc internally. I understand. It's invoking that functionality is what I meant to say but not technically running that command. > >> How do I get these two paths to generate the same code? > Call llc with -Os? You can't pass -Os to llc. -Os is a clang only option. It's reflected in the IR used by llc in the function attributes. From baldrick at free.fr Tue Jul 23 06:04:13 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 23 Jul 2013 15:04:13 +0200 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: References: Message-ID: <51EE7F4D.7090403@free.fr> Hi Stephen, On 22/07/13 19:19, Stephen Lin wrote: > Hi, > > I noticed that ConstantFP::get automatically returns the appropriately > types Constant depending on the LLVM type passed in (i.e. if called > with a vector, it returns a splat vector with the given constant). > > Is there any simple way to do the inverse of this function? i.e., > given a llvm::Value, check whether it is either a scalar of the given > constant value or a splat vector with the given constant value? I > can't seem to find any, and it doesn't look like the pattern matching > interface provides something similar to this either. yes, getUniqueInteger. Ciao, Duncan. From renato.golin at linaro.org Tue Jul 23 06:32:06 2013 From: renato.golin at linaro.org (Renato Golin) Date: Tue, 23 Jul 2013 14:32:06 +0100 Subject: [LLVMdev] -Os In-Reply-To: <51EE6A33.2010203@mips.com> References: <51EE6A33.2010203@mips.com> Message-ID: On 23 July 2013 12:34, reed kotler wrote: > How do I get these two paths to generate the same code? > Hi Reed, First, make sure that it's really -Os that is generating the difference by calling clang -v -Os, and explicitly calling all flags apart from -Os and then compare the results. Try to mimic as many parameters in llc as possible, with llc specific flags. I've seen this happen when lowering to specific ARM cores and if I didn't use the same relevant flags on both sides, the final code would be broken. --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From stewartd at codeaurora.org Tue Jul 23 06:55:53 2013 From: stewartd at codeaurora.org (Daniel Stewart) Date: Tue, 23 Jul 2013 09:55:53 -0400 Subject: [LLVMdev] Question on optimizeThumb2JumpTables Message-ID: <00e401ce87ac$595e73d0$0c1b5b70$@codeaurora.org> In looking at the code in ARMConstantislandPass.cpp::optimizeThumb2JumpTables(), I see that there is the following condition for not creating tbb-based jump tables: // The instruction should be a tLEApcrel or t2LEApcrelJT; we want // to delete it as well. MachineInstr *LeaMI = PrevI; if ((LeaMI->getOpcode() != ARM::tLEApcrelJT && LeaMI->getOpcode() != ARM::t2LEApcrelJT) || LeaMI->getOperand(0).getReg() != BaseReg) OptOk = false; if (!OptOk) continue; I am trying to figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds: 8944B BB#53: derived from LLVM BB %172 Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 Predecessors according to CFG: BB#52 8976B %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg 8992B %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg 9004B %LR = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg 9008B t2BR_JT %R1, %R10, , 2 Shrink JT: t2BR_JT %R1, %R10, , 2 addr: %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg lea: %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg >From this we see that the BaseReg = R1. R1 also happens to be the register used in the t2ADDrs calculation as well as defined by the t2LEApcrelJT operation. Because R1 is defined by t2LEApcrelJT, the restriction is met. However, in the next example, it fails: 5808B BB#30: derived from LLVM BB %105 Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 Predecessors according to CFG: BB#29 5840B %R3 = t2LEApcrelJT , 1, pred:14, pred:%noreg 5856B %R2 = t2ADDrs %R3, %R7, 18, pred:14, pred:%noreg, opt:%noreg 5872B t2BR_JT %R2, %R7, , 1 Successors according to CFG: BB#90(17) BB#31(17) BB#32(17) BB#33(17) BB#34(17) BB#51(17) Here we see that the BaseReg = R2. But the t2LEApcrelJT instruction defines R3, not R2. But this is should be fine, because the t2ADDrs instruction takes R3 and defines R2, which is the real base address. So my question is why is the restriction LeaMI->getOperand(0).getReg() != BaseReg there? Shouldn't the restriction be trying to ensure that the register defined by t2LEApcrelJT also be the register used by the t2ADDrs instruction? It seems this test is being overly restrictive. Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Tue Jul 23 07:07:49 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Tue, 23 Jul 2013 16:07:49 +0200 Subject: [LLVMdev] Steps to addDestination Message-ID: Hi, I need to addDestination to some basic blocks I used the following code Value* Address; IndirectBrInst *IBI = IndirectBrInst::Create(Address, Result.size(),i->getTerminator() ); IBI->addDestination(i); The following error was issued void llvm::IndirectBrInst::init(llvm::Value *, unsigned int): Assertion `Address && Address->getType()->isPointerTy() && "Address of indirectbr must be a pointer"' failed. I need to know the steps to add new indirect branch to a basic block with this method -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Tue Jul 23 07:38:20 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Tue, 23 Jul 2013 15:38:20 +0100 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: Hi Rasha, > I need to addDestination to some basic blocks Just to make sure there's no confusion here: you really are trying to create code like: define i32 @foo(i1 %tst) { %Address = select i1 %tst, i8* blockaddress(@foo, %true), i8* blockaddress(@foo, %false) indirectbr i8* %Address, [label %true, label %false] ; This is what you're creating true: ret i32 42 false: ret i32 0 } and not: define i32 @bar(i1 %tst) { br i1 %tst, label %true, label %false ; You're not trying to create this true: ret i32 42 false: ret i32 0 } If that's incorrect, you're going about it entirely the wrong way. We can help with either, but it's best not to go too far on a misunderstanding. > Value* Address; > IndirectBrInst *IBI = IndirectBrInst::Create(Address, Result.size(),i->getTerminator() ); > > IBI->addDestination(i); The problem seems to be that "Address" is uninitialised but it needs to be set to some valid pointer value before creating your IndirectBr. To replicate my first example it would be set to the "select" instruction, for example; whatever produces your testination blockaddress. Cheers. Tim. From tobias at grosser.es Tue Jul 23 08:50:12 2013 From: tobias at grosser.es (Tobias Grosser) Date: Tue, 23 Jul 2013 08:50:12 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <3ae8cb58.2cb9.1400a524281.Coremail.tanmx_star@yeah.net> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> <51ED4164.9000008@grosser.es> <3ae8cb58.2cb9.1400a524281.Coremail.tanmx_star@yeah.net> Message-ID: <51EEA634.90109@grosser.es> On 07/22/2013 11:58 PM, Star Tan wrote: > Hi Tobias, > > > I have attached a patch file to optimize string operations in Polly-Detect pass. > In this patch file, I put most of long string operations in the condition variable of "PollyViewMode" or in the DEBUG mode. OK. > From 448482106e8d815afa40e4ce8543ba3f6f0237f1 Mon Sep 17 00:00:00 2001 > From: Star Tan > Date: Mon, 22 Jul 2013 23:48:45 -0700 > Subject: [PATCH] ScopDetect: Optimize compile-time cost for log string > operations. > > String operatioins resulted by raw_string_ostream in the INVALID macro can lead > to significant compile-time overhead when compiling large size source code. > This is because raw_string_ostream relies on TypeFinder class, whose > compile-time cost increases as the size of the module increases. This patch > targets to avoid calling TypeFinder in normal case, so TypeFinder class is only > called in DEBUG mode with DEBUG macro or in PollyView mode. > > With this patch file, the relative compile-time cost of Polly-detect pass does > not increase even compiling very large size source code. > --- > include/polly/Options.h | 3 ++ > include/polly/ScopDetection.h | 6 +++ > lib/Analysis/ScopDetection.cpp | 93 +++++++++++++++++++++++------------------- > lib/RegisterPasses.cpp | 22 ++++++---- > 4 files changed, 75 insertions(+), 49 deletions(-) > > diff --git a/include/polly/Options.h b/include/polly/Options.h > index 62e0960..733edd0 100644 > --- a/include/polly/Options.h > +++ b/include/polly/Options.h > @@ -17,4 +17,7 @@ > #include "llvm/Support/CommandLine.h" > > extern llvm::cl::OptionCategory PollyCategory; > +namespace polly { > + extern bool PollyViewMode; > +} > #endif > diff --git a/include/polly/ScopDetection.h b/include/polly/ScopDetection.h > index 6ee48ee..5a5d7d1 100755 > --- a/include/polly/ScopDetection.h > +++ b/include/polly/ScopDetection.h > @@ -145,6 +145,12 @@ class ScopDetection : public FunctionPass { > /// @return True if the call instruction is valid, false otherwise. > static bool isValidCallInst(CallInst &CI); > > + /// @brief Report invalid alias. > + /// > + /// @param AS The alias set. > + /// @param Context The context of scop detection. > + void reportInvalidAlias(AliasSet &AS, DetectionContext &Context) const; Nice. > diff --git a/lib/Analysis/ScopDetection.cpp b/lib/Analysis/ScopDetection.cpp > index 9b2a9a8..4f33f6c 100644 > --- a/lib/Analysis/ScopDetection.cpp > +++ b/lib/Analysis/ScopDetection.cpp > @@ -108,11 +108,13 @@ STATISTIC(ValidRegion, "Number of regions that a valid part of Scop"); > > #define INVALID(NAME, MESSAGE) \ > do { \ > - std::string Buf; \ > - raw_string_ostream fmt(Buf); \ > - fmt << MESSAGE; \ > - fmt.flush(); \ > - LastFailure = Buf; \ > + if (PollyViewMode) { \ I believe this variable should describe what we do, rather than if a a certain user of this feature is enabled. Something like if (TrackFailures) { } > +void ScopDetection::reportInvalidAlias(AliasSet &AS, > + DetectionContext &Context) const { It is great that you extracted this function. > + std::string Message; > + raw_string_ostream OS(Message); > + > + if (PollyViewMode || ::llvm::DebugFlag) { This is a little unsatisfying. We now have two conditions that need to be in sync. I think you can avoid this, if you rename the function to std::string ScopDetection::formatInvalidAlias(AliasSet &AS) and keep the INVALID_ macro at the place where the error happens. > diff --git a/lib/RegisterPasses.cpp b/lib/RegisterPasses.cpp > index 7fc0960..2e25e4d 100644 > --- a/lib/RegisterPasses.cpp > +++ b/lib/RegisterPasses.cpp > @@ -125,28 +125,34 @@ static cl::opt DeadCodeElim("polly-run-dce", > cl::Hidden, cl::init(false), cl::ZeroOrMore, > cl::cat(PollyCategory)); > > -static cl::opt > +bool polly::PollyViewMode; > + > +static cl::opt > PollyViewer("polly-show", > cl::desc("Highlight the code regions that will be optimized in a " > "(CFG BBs and LLVM-IR instructions)"), > - cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory)); > + cl::location(polly::PollyViewMode), cl::init(false), cl::ZeroOrMore, > + cl::cat(PollyCategory)); > > -static cl::opt > +static cl::opt > PollyOnlyViewer("polly-show-only", > cl::desc("Highlight the code regions that will be optimized in " > "a (CFG only BBs)"), > - cl::init(false), cl::cat(PollyCategory)); > + cl::location(polly::PollyViewMode), cl::init(false), > + cl::cat(PollyCategory)); > -static cl::opt > +static cl::opt > PollyPrinter("polly-dot", cl::desc("Enable the Polly DOT printer in -O3"), > cl::Hidden, cl::value_desc("Run the Polly DOT printer at -O3"), > - cl::init(false), cl::cat(PollyCategory)); > + cl::location(polly::PollyViewMode), cl::init(false), > + cl::cat(PollyCategory)); > > -static cl::opt PollyOnlyPrinter( > +static cl::opt PollyOnlyPrinter( > "polly-dot-only", > cl::desc("Enable the Polly DOT printer in -O3 (no BB content)"), cl::Hidden, > cl::value_desc("Run the Polly DOT printer at -O3 (no BB content"), > - cl::init(false), cl::cat(PollyCategory)); > + cl::location(polly::PollyViewMode), cl::init(false), > + cl::cat(PollyCategory)); Sorry. Having all options storing their value in the very same location does not look right. When I was talking about cl::location() I rather ment that you introduce a new option -polly-detect-track-failures that can also be set by an cl::location. Another alternative is that you add a parameter to Pass *polly::createScopDetectionPass() { return new ScopDetection(); } which can be used to enable the failure tracking. Cheers, Tobias From pawel.bylica at ibs.org.pl Tue Jul 23 09:29:01 2013 From: pawel.bylica at ibs.org.pl (=?UTF-8?Q?Pawe=C5=82_Bylica?=) Date: Tue, 23 Jul 2013 18:29:01 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field Message-ID: Hi, Recently I was hit by an assert in WinCOFFObjectWriter that had forbidden storing pointer to string table in header name field when the pointer had more that 6 decimal digits. This limit had been chosen to make implementation easier (sprintf adds null character at the end) and could be increased to 7 digits. My patch is attached. The implementation uses additional buffer on the stack to make integer to string conversion. - Paweł Bylica -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: WinCOFFObjectWriter_PointerToStingTable.patch Type: application/octet-stream Size: 1529 bytes Desc: not available URL: From nico.rieck at gmail.com Tue Jul 23 09:42:35 2013 From: nico.rieck at gmail.com (Nico Rieck) Date: Tue, 23 Jul 2013 18:42:35 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: Message-ID: <51EEB27B.8000301@gmail.com> On 23.07.2013 18:29, Paweł Bylica wrote: > Hi, > > Recently I was hit by an assert in WinCOFFObjectWriter that had forbidden > storing pointer to string table in header name field when the pointer had > more that 6 decimal digits. This limit had been chosen to make > implementation easier (sprintf adds null character at the end) and could be > increased to 7 digits. I've already implemented this (and also included the undocumented base64 encoding), see: http://llvm-reviews.chandlerc.com/D667 Maybe someone actually hitting this limit in practice will make someone accept at least the first part of D667. -Nico From rnk at google.com Tue Jul 23 09:43:29 2013 From: rnk at google.com (Reid Kleckner) Date: Tue, 23 Jul 2013 12:43:29 -0400 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: Message-ID: Is there a problem if the string is not null terminated? If not, you can snprintf it right into place instead of doing sprintf+mempcy. On Tue, Jul 23, 2013 at 12:29 PM, Paweł Bylica wrote: > Hi, > > Recently I was hit by an assert in WinCOFFObjectWriter that had forbidden > storing pointer to string table in header name field when the pointer had > more that 6 decimal digits. This limit had been chosen to make > implementation easier (sprintf adds null character at the end) and could be > increased to 7 digits. > > My patch is attached. The implementation uses additional buffer on the > stack to make integer to string conversion. > > - Paweł Bylica > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico.rieck at gmail.com Tue Jul 23 09:48:10 2013 From: nico.rieck at gmail.com (Nico Rieck) Date: Tue, 23 Jul 2013 18:48:10 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: Message-ID: <51EEB3CA.7070307@gmail.com> On 23.07.2013 18:43, Reid Kleckner wrote: > Is there a problem if the string is not null terminated? If not, you can > snprintf it right into place instead of doing sprintf+mempcy. snprintf always null-terminates (and truncates if there's not enough space). -Nico From nico.rieck at gmail.com Tue Jul 23 09:55:04 2013 From: nico.rieck at gmail.com (Nico Rieck) Date: Tue, 23 Jul 2013 18:55:04 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: <51EEB3CA.7070307@gmail.com> References: <51EEB3CA.7070307@gmail.com> Message-ID: <51EEB568.5030307@gmail.com> On 23.07.2013 18:48, Nico Rieck wrote:> On 23.07.2013 18:43, Reid Kleckner wrote: >> Is there a problem if the string is not null terminated? If not, you can >> snprintf it right into place instead of doing sprintf+mempcy. > > snprintf always null-terminates (and truncates if there's not enough > space). Urgh, nevermind. Brain fart here. -Nico From rnk at google.com Tue Jul 23 09:55:10 2013 From: rnk at google.com (Reid Kleckner) Date: Tue, 23 Jul 2013 12:55:10 -0400 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: <51EEB3CA.7070307@gmail.com> References: <51EEB3CA.7070307@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 12:48 PM, Nico Rieck wrote: > On 23.07.2013 18:43, Reid Kleckner wrote: > >> Is there a problem if the string is not null terminated? If not, you can >> snprintf it right into place instead of doing sprintf+mempcy. >> > > snprintf always null-terminates (and truncates if there's not enough > space). Nuh uh: "The _snprintf function formats and stores count or fewer characters in buffer, and appends a terminating null character if the formatted string length is strictly less than count characters." http://msdn.microsoft.com/en-us/library/2ts7cx93(v=vs.100).aspx Please don't assume snprintf always null terminates. This may be Windows-specific behavior that you shouldn't rely on. If that's the case, ignore my suggestion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico.rieck at gmail.com Tue Jul 23 10:10:54 2013 From: nico.rieck at gmail.com (Nico Rieck) Date: Tue, 23 Jul 2013 19:10:54 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: <51EEB3CA.7070307@gmail.com> Message-ID: <51EEB91E.7010606@gmail.com> On 23.07.2013 18:55, Reid Kleckner wrote: > Please don't assume snprintf always null terminates. I'm reading C99 7.19.6.4 and C11 7.21.6.5 which says: "If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array." and "Thus, the null-terminated output has been completely written if and only if the returned value is nonnegative and less than n." -Nico From swlin at post.harvard.edu Tue Jul 23 10:10:14 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Tue, 23 Jul 2013 10:10:14 -0700 Subject: [LLVMdev] Inverse of ConstantFP::get and similar functions? In-Reply-To: <51EE7F4D.7090403@free.fr> References: <51EE7F4D.7090403@free.fr> Message-ID: > On 22/07/13 19:19, Stephen Lin wrote: >> >> Hi, >> >> I noticed that ConstantFP::get automatically returns the appropriately >> types Constant depending on the LLVM type passed in (i.e. if called >> with a vector, it returns a splat vector with the given constant). >> >> Is there any simple way to do the inverse of this function? i.e., >> given a llvm::Value, check whether it is either a scalar of the given >> constant value or a splat vector with the given constant value? I >> can't seem to find any, and it doesn't look like the pattern matching >> interface provides something similar to this either. > > > yes, getUniqueInteger. > > Ciao, Duncan. > Well, Eli already pointed me in the direction of m_SpecificFP, so no big deal now, but getUniqueInteger isn't what I needed because: 1. It's only for integers 2. It asserts when the vector does not contain all of the same integer, rather than returning some kind of failure code. Anyway, it's not a big deal now, but it does seem kind of odd to me that the Constant hierarchy of classes contain so many functions that allow easy abstraction of scalar and vector types in one direction (when creating constants), but not in the other direction (when detecting constants). Using the pattern matches work okay, though. Stephen From james.dutton at gmail.com Tue Jul 23 10:13:09 2013 From: james.dutton at gmail.com (James Courtier-Dutton) Date: Tue, 23 Jul 2013 18:13:09 +0100 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: <51EEA634.90109@grosser.es> References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> <51ED4164.9000008@grosser.es> <3ae8cb58.2cb9.1400a524281.Coremail.tanmx_star@yeah.net> <51EEA634.90109@grosser.es> Message-ID: On 23 July 2013 16:50, Tobias Grosser wrote: > On 07/22/2013 11:58 PM, Star Tan wrote: > >> Hi Tobias, >> >> >> I have attached a patch file to optimize string operations in >> Polly-Detect pass. >> In this patch file, I put most of long string operations in the condition >> variable of "PollyViewMode" or in the DEBUG mode. >> > > OK. > > Is there any way to make the debug messages themselves more efficient? Perhaps reimplementing it so that any table lookups are quick and no malloc/free is done in the process. James -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahmed.bougacha at gmail.com Tue Jul 23 10:47:42 2013 From: ahmed.bougacha at gmail.com (Ahmed Bougacha) Date: Tue, 23 Jul 2013 10:47:42 -0700 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: I (arduously) updated the docs, so VS 2008 support is gone now. I’m still curious about C++11 though! -- Ahmed From legalize at xmission.com Tue Jul 23 11:36:03 2013 From: legalize at xmission.com (Richard) Date: Tue, 23 Jul 2013 12:36:03 -0600 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: Message-ID: In article , Reid Kleckner writes: > Is there a problem if the string is not null terminated? If not, you can > snprintf it right into place instead of doing sprintf+mempcy. Am I the only one who scratches my head and says: sprintf? memcpy? Why are we using error-prone C APIs in C++ code? -- "The Direct3D Graphics Pipeline" free book The Computer Graphics Museum The Terminals Wiki Legalize Adulthood! (my blog) From grosbach at apple.com Tue Jul 23 11:36:08 2013 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 23 Jul 2013 11:36:08 -0700 Subject: [LLVMdev] -Os In-Reply-To: <51EE756F.80503@mips.com> References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> Message-ID: <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> On Jul 23, 2013, at 5:22 AM, reed kotler wrote: > On 07/23/2013 05:19 AM, Konstantin Tokarev wrote: >> >> 23.07.2013, 16:05, "reed kotler" : >>> When I use -Os with a clang that implicitly calls llc, I get much >>> different code than when call clang first with -Os and then call llc. >> clang does NOT call llc internally. > I understand. It's invoking that functionality is what I meant to say but not technically > running that command. This isn’t just a nitpick. This is exactly why you’re seeing differences. The pass managers aren’t always set up the same, for example. FWIW, I feel your pain. This is a long-standing weakness of our infrastructure. -Jim >> >>> How do I get these two paths to generate the same code? >> Call llc with -Os? > You can't pass -Os to llc. > > -Os is a clang only option. It's reflected in the IR used by llc in the function attributes. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From legalize at xmission.com Tue Jul 23 11:51:03 2013 From: legalize at xmission.com (Richard) Date: Tue, 23 Jul 2013 12:51:03 -0600 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: <51EEB3CA.7070307@gmail.com> Message-ID: In article , Reid Kleckner writes: > Nuh uh: "The _snprintf function formats and stores count or fewer > characters in buffer, and appends a terminating null character if the > formatted string length is strictly less than count characters." > http://msdn.microsoft.com/en-us/library/2ts7cx93(v=vs.100).aspx > > Please don't assume snprintf always null terminates. > > This may be Windows-specific behavior that you shouldn't rely on. If > that's the case, ignore my suggestion. Yes, _snprintf with MSVC is not a conforming C99 (or whatever) standard implementation of snprintf. I've filed a bug on it, but don't expect a fix from them anytime soon as they deem it low priority. This is just another reason to stay away from these error-prone C APIs in C++ code. -- "The Direct3D Graphics Pipeline" free book The Computer Graphics Museum The Terminals Wiki Legalize Adulthood! (my blog) From g.franceschetti at vidya.it Tue Jul 23 13:02:25 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti) Date: Tue, 23 Jul 2013 22:02:25 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <51EE1839.6040204@redstar.de> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> <51EE1839.6040204@redstar.de> Message-ID: <51EEE151.7050400@vidya.it> On my PC it does not work. I'm able to generate and open the Visual studio project (VS 11 Win64). I have tried with cmake on command line, with the cmake gui, release 3.4 and 3.3. But when I try to compile the solution I get a loot of errors about a missing stdbool.h file. Does anyone have any hint? Giorgio Il 23/07/2013 07.44, Kai Nacke ha scritto: > Hi Giorgio, > > here is another description how to compile LLVM on Windows: > http://wiki.dlang.org/Building_and_hacking_LDC_on_Windows_using_MSVC > > Maybe this is helpful. I created this for Windows 7 but I also > repeated it successfully on Windows 8. > > Regards > Kai > > On 22.07.2013 22:51, Giorgio Franceschetti wrote: >> Hi all, >> yes, I do not know python and I installed it only for being able to >> build LLVM. >> Now I have installed version 2.7. >> >> I tried with codeblock project generation, but I'm still getting errors. >> >> So I moved to visual studio as per "getting started" guide. >> >> I run the command: cmake -G "Visual Studio 11" ..\llvm from my build >> folder. >> >> It lists a lot of file not found during the execution, but at the end it >> does create th visual studio projects. >> Based on the web guide, it should be successful. >> First question, is it really? >> >> Then, I open visual studio and run the solution compilation. >> >> But, after a long time, I got a lot of errors stating that it is not >> possible to find the stdbool.h file + a few others. >> Example: >> error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such >> file or directory (> path>\llvm\projects\compiler-rt\lib\floatuntisf.c) > path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 >> error C2061: errore di sintassi: identificatore '__attribute__' (> path>\llvm\projects\compiler-rt\lib\int_util.c) > path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 >> clang_rt.x86_64 >> error C2059: errore di sintassi: ';' (> path>\llvm\projects\compiler-rt\lib\int_util.c) > path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 >> clang_rt.x86_64 >> error C2182: 'noreturn': utilizzo non valido del tipo 'void' (> path>\llvm\projects\compiler-rt\lib\int_util.c) > path>\llvm\projects\compiler-rt\lib\int_util.h 27 1 >> clang_rt.x86_64 >> error C1083: Impossibile aprire il file inclusione 'stdbool.h': No such >> file or directory (> path>\llvm\projects\compiler-rt\lib\int_util.c) > path>\llvm\projects\compiler-rt\lib\int_lib.h 37 1 clang_rt.x86_64 >> >> >> What could it be? >> >> Any help is appreciated, >> >> Giorgio >> >> Il 22/07/2013 03.38, Óscar Fuentes ha scritto: >>> Reid Kleckner writes: >>> >>>> My initial impression was that still probably nobody uses python 3, >>>> so it's >>>> not worth adding support that will break. But if users actually have >>>> python 3, maybe it's worth it. >>> I think that on this case the problem was not people who actually have >>> python 3, but people who see Python as a requirement for building LLVM >>> and go to python.org and download the "most recent" version, i.e. >>> python >>> 3, because they are unaware of the incompatibilities. Believe it or >>> not, >>> there are developers who don't know about the Python mess :-) >>> >>> If adding support for version 3 is problematic, a check that gives a >>> helpful message would be a good start. If it can't be implemented on >>> the >>> python scripts, it could be implemented on the cmake/configure scripts. >>> >>> BTW, http://llvm.org/docs/GettingStarted.html mentions Python as a >>> requirement for the automated test suite (not for the build.) Says >>> version >=2.4. A user reading that would assume that version 3.X is ok, >>> or no Python at all if he only wishes to play with LLVM. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From rkotler at mips.com Tue Jul 23 14:19:13 2013 From: rkotler at mips.com (reed kotler) Date: Tue, 23 Jul 2013 14:19:13 -0700 Subject: [LLVMdev] ubuntu on the mac Message-ID: <51EEF351.9030109@mips.com> I have a new 11" mac air with lots of disk and ram. I'm trying to figure out the best way to configure it for llvm development on ubuntu. Bootcamp or Virtual Box (or other Virtualization). Bootcamp seems like it would be a nuisance. This is just not my main development machine but it's very portable. :) From ofv at wanadoo.es Tue Jul 23 14:53:04 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Tue, 23 Jul 2013 23:53:04 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> <51EE131F.70008@vidya.it> Message-ID: <87mwpdhrjz.fsf@wanadoo.es> Giorgio Franceschetti writes: > I also tried to build LLVM with 3.3 sources. > Same problems. If you omit compiler-rt, does it work? (compiler-rt is not a required component.) > Even worse, Visual Studio hangs and I had to kill the process. > > What could it be? Is Visual Studio 2012 working with LLVM/clang? > > Or LLVM/Clang is not supposed to work on windows (I saw also that > there are no binaries ready for the windows platform). For the most part, LLVM works fine on Windows, with some limitations. Clang has serious shortcomings, more so if you build it with VS (Mingw is better because Clang can use the headers and standard C++ library that comes with MinGW, but not the standard C++ library that comes with VS.) >> It lists a lot of file not found during the execution, but at the >> end it does create th visual studio projects. >> Based on the web guide, it should be successful. >> First question, is it really? Yes. What you are seeing are the platform checks, where the build system looks for the presence of functions, headers, etc and then generates a configuration file with that information. From chfast at gmail.com Tue Jul 23 15:07:16 2013 From: chfast at gmail.com (=?UTF-8?Q?Pawe=C5=82_Bylica?=) Date: Wed, 24 Jul 2013 00:07:16 +0200 Subject: [LLVMdev] [Patch] WinCOFFObjectWriter: fix for storing pointer to string table in header name field In-Reply-To: References: Message-ID: On Tue, Jul 23, 2013 at 8:36 PM, Richard wrote: > > In article ty+7zKZU6Ad4jZ5J5Rb0qoMa-bNtO0f+xK_8SYfy3RyFbg at mail.gmail.com>, > Reid Kleckner writes: > > > Is there a problem if the string is not null terminated? If not, you can > > snprintf it right into place instead of doing sprintf+mempcy. > > Am I the only one who scratches my head and says: > > sprintf? > memcpy? > > Why are we using error-prone C APIs in C++ code? I've just kept the style of the previous implementation. I'd love to use std::to_string(), but it is just not available. Moreover, you still have to move this 8 bytes of text somehow. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnk at google.com Tue Jul 23 15:30:06 2013 From: rnk at google.com (Reid Kleckner) Date: Tue, 23 Jul 2013 18:30:06 -0400 Subject: [LLVMdev] Cutting down the number of platform checks Message-ID: On Tue, Jul 23, 2013 at 5:53 PM, Óscar Fuentes wrote: > Yes. What you are seeing are the platform checks, where the build system > looks for the presence of functions, headers, etc and then generates a > configuration file with that information. I've been meaning to cut down on the number of these because they are super slow and wasteful. Some of them are dead and can be removed without discussion. Some of them are used inconsistently, like HAVE_STRING_H. Do we really support any platform that lacks a ? All of Errno.cpp is in an ifdef for this macro, but I suspect we include string.h elsewhere unconditionally. http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/Errno.cpp?revision=167191&view=markup Is everyone OK with eliminating checks for headers and symbols that we use unconditionally anyway? (assert.h, mempcy, etc) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nrotem at apple.com Tue Jul 23 15:33:34 2013 From: nrotem at apple.com (Nadav Rotem) Date: Tue, 23 Jul 2013 15:33:34 -0700 Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: References: <96EBFCFE-8AEA-428D-A72D-2DDD0335CB95@apple.com> Message-ID: <10520198-A465-4F22-A13C-8719B3F82A7E@apple.com> Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem wrote: > > On Jul 14, 2013, at 9:52 PM, Chris Lattner wrote: > >> >> On Jul 13, 2013, at 11:30 PM, Nadav Rotem wrote: >> >>> Hi, >>> >>> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements (below) I would like to enable the SLP-vectorizer by default on -O3. I would like to hear what others in the community think about this and give other people the opportunity to perform their own performance measurements. >> >> This looks great Nadav. The performance wins are really big. How you investigated the bh and bullet regression though? > > Thanks. Yes, I looked at both. The hot function in BH is “gravsub”. The vectorized IR looks fine and the assembly looks fine, but for some reason Instruments reports that the first vector-subtract instruction takes 18% of the time. The regression happens both with the VEX prefix and without. I suspected that the problem is the movupd's that load xmm0 and xmm1. I started looking at some performance counters on Friday, but I did not find anything suspicious yet. > > +0x00 movupd 16(%rsi), %xmm0 > +0x05 movupd 16(%rsp), %xmm1 > +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? > +0x0f movapd %xmm0, %xmm2 > +0x13 mulsd %xmm2, %xmm2 > +0x17 xorpd %xmm1, %xmm1 > +0x1b addsd %xmm2, %xmm1 > > I spent less time on Bullet. Bullet also has one hot function (“resolveSingleConstraintRowLowerLimit”). On this code the vectorizer generates several trees that use the <3 x float> type. This is risky because the loads/stores are inefficient, but unfortunately triples of RGB and XYZ are very popular in some domains and we do want to vectorize them. I skimmed through the IR and the assembly and I did not see anything too bad. The next step would be to do a binary search on the places where the vectorizer fires to locate the bad pattern. > > On AVX we have another regression that I did not mention: Flops-7. When we vectorize we cause more spills because we do a poor job scheduling non-destructive source instructions (related to PR10928). Hopefully Andy’s scheduler will fix this regression once it is enabled. > > I did not measure code size, but I did measure compile time. There are 4-5 workloads (not counting workloads that run below 0.5 seconds) where the compile time increase is more than 5%. I am aware of a problem in the (quadratic) code that looks for consecutive stores. This code calls SCEV too many times. I plan to fix this. > > Thanks, > Nadav > > >> We should at least understand what is going wrong there. bh is pretty tiny, so it should be straight-forward. It would also be really useful to see what the code size and compile time impact is. >> >> -Chris >> >>> >>> — Performance Gains — >>> SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% >>> MultiSource/Benchmarks/Olden/power/power -18.55% >>> MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.71% >>> SingleSource/Benchmarks/Misc/flops-6 -11.02% >>> SingleSource/Benchmarks/Misc/flops-5 -10.03% >>> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.37% >>> External/Nurbs/nurbs -7.98% >>> SingleSource/Benchmarks/Misc/pi -7.29% >>> External/SPEC/CINT2000/252_eon/252_eon -5.78% >>> External/SPEC/CFP2006/444_namd/444_namd -4.52% >>> External/SPEC/CFP2000/188_ammp/188_ammp -4.45% >>> MultiSource/Applications/SIBsim4/SIBsim4 -3.58% >>> MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% >>> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% >>> MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl -2.75% >>> MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% >>> MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.95% >>> SingleSource/Benchmarks/Misc/flops -1.89% >>> SingleSource/Benchmarks/Misc/oourafft -1.71% >>> MultiSource/Benchmarks/mafft/pairlocalalign -1.16% >>> External/SPEC/CFP2006/447_dealII/447_dealII -1.06% >>> >>> — Regressions — >>> MultiSource/Benchmarks/Olden/bh/bh 22.47% >>> MultiSource/Benchmarks/Bullet/bullet 7.31% >>> SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% >>> SingleSource/Benchmarks/SmallPT/smallpt 3.91% >>> >>> Thanks, >>> Nadav >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: report.pdf Type: application/pdf Size: 53595 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Tue Jul 23 15:34:38 2013 From: echristo at gmail.com (Eric Christopher) Date: Tue, 23 Jul 2013 15:34:38 -0700 Subject: [LLVMdev] Cutting down the number of platform checks In-Reply-To: References: Message-ID: Sure. Preapproved if you feel the need for autoconf. Let me know if you need/want help regenerating. -eric On Tue, Jul 23, 2013 at 3:30 PM, Reid Kleckner wrote: > On Tue, Jul 23, 2013 at 5:53 PM, Óscar Fuentes wrote: >> >> Yes. What you are seeing are the platform checks, where the build system >> looks for the presence of functions, headers, etc and then generates a >> configuration file with that information. > > > I've been meaning to cut down on the number of these because they are super > slow and wasteful. Some of them are dead and can be removed without > discussion. > > Some of them are used inconsistently, like HAVE_STRING_H. Do we really > support any platform that lacks a ? All of Errno.cpp is in an > ifdef for this macro, but I suspect we include string.h elsewhere > unconditionally. > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/Errno.cpp?revision=167191&view=markup > > Is everyone OK with eliminating checks for headers and symbols that we use > unconditionally anyway? (assert.h, mempcy, etc) > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From dblaikie at gmail.com Tue Jul 23 15:37:06 2013 From: dblaikie at gmail.com (David Blaikie) Date: Tue, 23 Jul 2013 15:37:06 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: On Mon, Jul 22, 2013 at 4:17 PM, Quentin Colombet wrote: > Hi, > > Compared to my previous email, I have added Hal’s idea for formatting the > message and pull back some idea from the "querying framework”. > Indeed, I propose to add some information in the reporting so that a > front-end (more generally a client) can filter the diagnostics or take > proper actions. > See the details hereafter. > > On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: > > > On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher > wrote: >> >> >> This is pretty much the same as what Quentin proposed (with the >> >> addition of the enum), isn't it? >> >> >> > >> > Pretty close yeah. >> > >> >> Another thought and alternate strategy for dealing with these sorts of >> things: >> >> A much more broad set of callback machinery that allows the backend to >> communicate values or other information back to the front end that can >> then decide what to do. We can define an interface around this, but >> instead of having the backend vending diagnostics we have the callback >> take a "do something with this value" which can just be "communicate >> it back to the front end" or a diagnostic callback can be passed down >> from the front end, etc. >> >> This will probably take a bit more design to get a general framework >> set up, but imagine the usefulness of say being able to automatically >> reschedule a jitted function to a thread with a larger default stack >> size if the callback states that the thread size was N+1 where N is >> the size of the stack for a thread you've created. > > > FWIW, *this* is what I was trying to get across. Not that it wouldn't be a > callback-based mechanism, but that it should be a fully general mechanism > rather than having something to do with warnings, errors, notes, etc. If a > frontend chooses to use it to produce such diagnostics, cool, but there are > other use cases that the same machinery should serve. > > > I like the general idea. > > To be sure I understood the proposal, let me give an example. > > ** Example ** > The compiler says here is the size of the stack for Loc via a “handler” > (“handler" in the sense whatever mechanism we come up to make such > communication possible). Then the front-end builds the diagnostic from that > information (or query for more if needed) or drops everything if it does not > care about this size for instance (either it does not care at all or the > size is small enough compared to its setting). > > ** Comments ** > Unless we have one handler per -kind of - use, and I would like to avoid > that, I think that's, somewhat, Chandlers point (sorry, I hate to play Telephone here - but hope to help clarify some positions... apologies if this just further obfuscates/confuses). I believe the idea is some kind of generic callback type with a bunch of no-op-default callbacks, override the ones your frontend cares about ("void onStackSize(size_t bytes)", etc...). Yes, we could come up with a system that doesn't require adding a new function call for every data that needs to be provided. Does it seem likely we'll have so many of these that we'll really want/need that? > I think we should still provide an information on the severity of the > thing we are reporting and what we are reporting. > Basically: > - Severity: Will the back-end abort after the information pass down or will > it continue (the boolean of the previous proposal)? In the case of specific callbacks - that would be statically known (you might have a callback for some particular problem (we ran out of registers & can't satisfy this inline asm due to the register allocation of this function) - it's the very contract that that callback is a fatal problem). If we have a more generic callback mechanism, yes - we could enshrine some general properties (such as fatality) in the common part & leave the specifics of what kind of fatal problem to the 'blob'. > - Kind: What are we reporting (the enum from the previous proposal)? > > I also think we should be able to provide a default (formatted) message, > such that a client that does not need to know what to do with the > information can still print something somehow useful, especially on abort > cases. Do you have some examples of fatal/abort cases we already have & how they're reported today? (including what kind of textual description we use?) > Thus, it sounds a good thing to me to have a string with some markers to > format the output plus the arguments to be used in the formatted output. > Hal’s proposal could do the trick (although I do not know if DIDescriptor > are the best thing to use here). > > ** Summary ** > I am starting to think that we should be able to cover the reporting case > plus some querying mechanism with something like: > void reportSomehtingToFEHandler(enum Reporting Kind, bool IsAbort, information>, const char* DefautMsg, the defautMsg>) Personally I dislike losing type safety in this kind of API ("here's a blob of data you must programmatically query based on a schema implied by the 'Kind' parameter & some documentation you read"). I'd prefer explicit callbacks per thing - if we're going to have to write an explicit structure & document the parameters to each of these callbacks anyway, it seems easier to document that by API. (for fatal cases we could have no default implementations - this would ensure clients would be required to update for new callbacks & not accidentally suppress them) > Where is supposed to be the class/struct/pointer to the > relevant information for this kind. If it is not enough the FE should call > additional APIs to get what it wants. > > This looks similar to the “classical” back-end report to front-end approach, > but gives more freedom to the front-end as it can choose what to do based on > the attached information. > I also believe this will reduce the need to expose back-end APIs and speed > up the process. Speed up the process of adding these diagnostic, possibly at the cost of having a more opaque/inscrutible API to data from LLVM, it seems. > However, the ability of the front-end (or client) to query the back-end is > limited to the places where the back-end is reporting something. Also, if > the back-end is meant to abort, the FE cannot do anything about it (e.g., > the stack size is not big enough for the jitted function). > That’s why I said it cover “some" querying mechanism. > > ** Concerns ** > 1. Testing. Testing - I assume we'd have opt/llc register for all these callbacks & print them in some way (it doesn't need to have a "stack size is too small warning" it just needs to print the stack size whenever it's told - or maybe have some way to opt in to callback rendering) & then check the behavior with FileCheck as usual (perhaps print this stuff to stderr so it doesn't get confused with bytecode/asm under -o -). That tests LLVM's contract - that it called the notifications. Testing Clang's behavior when these notifications are provided would either require end-to-end testing (just having Clang tests that run LLVM, assume it already passes the LLVM-only tests & then tests Clang behavior on top of that) as we do in a few places already - or have some kind of stub callback implementation we can point Clang to (it could read a file of callbacks to call). That would be nice, but going on past experience I don't suppose anyone would actually bother to implement it. > > Assuming we will always emit these reports, relying on a front-end to filter > out what is not currently relevant (e.g., we did not set the stack size > warning in the FE), what will happen when we test (make check) without a > front-end? > I am afraid we will pollute all tests or we will have some difficulty to > test a specific reporting. > > 2. Regarding a completely query based approach, like Chris pointed out, I do > not see how we can report consistent information at any given time. Also, > Eric, coming back to your jit example, how could we prevent the back-end to > abort if the jitted is too big for the stack? Eric's (originally Chandler's discussed in person) example wasn't about aborting compilation. The idea was you JIT a function, you get a callback saying "stack size 1 million bytes" and so you spin up a thread that has a big stack to run this function you just compiled. The point of the example is that a pure warning-based callback is 'general' for LLVM but very specific for LLVM /clients/ (LLVM as a library, something we should keep in mind) - all they can do is print it out. If we provide a more general feature for LLVM clients (callbacks that notify those clients about things they might be interested in, like the size of a function's stack) then they can build other features (apart from just warning) such as a JIT that dynamically chooses thread stack size based on the stack size of the functions it JITs. > > 3. Back to the strictly reporting approach where we extend the inlineasm > handler (the approach proposed by Bob and that I sketched a little bit > more), now looks similar to this approach expect that the back-end chooses > what it is relevant to report and the back-end does not need to pass down > the information. > The concern is how do we easily (in a robust and extendable manner) provide > a front-end/back-end option for each warning/error? > > > Thoughts? > > Cheers, > > -Quentin > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From renato.golin at linaro.org Tue Jul 23 15:40:01 2013 From: renato.golin at linaro.org (Renato Golin) Date: Tue, 23 Jul 2013 23:40:01 +0100 Subject: [LLVMdev] -Os In-Reply-To: <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> Message-ID: On 23 July 2013 19:36, Jim Grosbach wrote: > This isn’t just a nitpick. This is exactly why you’re seeing differences. > The pass managers aren’t always set up the same, for example. > > FWIW, I feel your pain. This is a long-standing weakness of our > infrastructure. > Jim, A while ago I proposed that we annotated the options the front-end passed to the back-end on the IR with named metadata, but it didn't catch on. Would it make sense to have some call-back mechanism while setting back-end flags to keep a tab on what's called and have a dump as metadata, so that you can just write it to the IR file at the end? More or less what we have for functions, currently. This would hint llc, lli and others to what flags it must set itself (architecture, optimizations, etc) and would minimize the impact of split compilation. Those tools are free to ignore any option it doesn't recognize, of course, as with any metadata. Another way would be to teach llc, lli and others all command line options of all supported front-ends, but that wouldn't be very productive, I think. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Tue Jul 23 15:59:22 2013 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 23 Jul 2013 15:59:22 -0700 Subject: [LLVMdev] -Os In-Reply-To: References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> Message-ID: On Jul 23, 2013, at 3:40 PM, Renato Golin wrote: > On 23 July 2013 19:36, Jim Grosbach wrote: > This isn’t just a nitpick. This is exactly why you’re seeing differences. The pass managers aren’t always set up the same, for example. > > FWIW, I feel your pain. This is a long-standing weakness of our infrastructure. > > Jim, > > A while ago I proposed that we annotated the options the front-end passed to the back-end on the IR with named metadata, but it didn't catch on. > > Would it make sense to have some call-back mechanism while setting back-end flags to keep a tab on what's called and have a dump as metadata, so that you can just write it to the IR file at the end? More or less what we have for functions, currently. > > This would hint llc, lli and others to what flags it must set itself (architecture, optimizations, etc) and would minimize the impact of split compilation. Those tools are free to ignore any option it doesn't recognize, of course, as with any metadata. > > Another way would be to teach llc, lli and others all command line options of all supported front-ends, but that wouldn't be very productive, I think. Maybe? I’m not sure what a good answer is here. Bob and Bill have both looked at this in more detail than I. What do you guys think? -Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Tue Jul 23 16:28:12 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Wed, 24 Jul 2013 01:28:12 +0200 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: 1- I need the first example. 2- I set the Address uninitialized according to the documentation " Setting the name on the Value automatically updates the module's symbol table" from Value.h source code 3- I'm not sure about "select" instruction, you mean that the address is the new destination (basic block)that will be added Thanks On 23 July 2013 16:38, Tim Northover wrote: > Hi Rasha, > > > I need to addDestination to some basic blocks > > Just to make sure there's no confusion here: you really are trying to > create code like: > > define i32 @foo(i1 %tst) { > %Address = select i1 %tst, i8* blockaddress(@foo, %true), i8* > blockaddress(@foo, %false) > indirectbr i8* %Address, [label %true, label %false] ; This is what > you're creating > true: > ret i32 42 > false: > ret i32 0 > } > > and not: > > define i32 @bar(i1 %tst) { > br i1 %tst, label %true, label %false ; You're not trying to create this > true: > ret i32 42 > false: > ret i32 0 > } > > If that's incorrect, you're going about it entirely the wrong way. We > can help with either, but it's best not to go too far on a > misunderstanding. > > > Value* Address; > > IndirectBrInst *IBI = IndirectBrInst::Create(Address, > Result.size(),i->getTerminator() ); > > > > IBI->addDestination(i); > > The problem seems to be that "Address" is uninitialised but it needs > to be set to some valid pointer value before creating your IndirectBr. > To replicate my first example it would be set to the "select" > instruction, for example; whatever produces your testination > blockaddress. > > Cheers. > > Tim. > -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Tue Jul 23 16:33:41 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Tue, 23 Jul 2013 18:33:41 -0500 (CDT) Subject: [LLVMdev] Enabling the SLP vectorizer by default for -O3 In-Reply-To: <10520198-A465-4F22-A13C-8719B3F82A7E@apple.com> Message-ID: <1743304566.13907077.1374622421355.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > > Hi, > > > Sorry for the delay in response. I measured the code size change and > noticed small changes in both directions for individual programs. I > found a 30k binary size growth for the entire testsuite + SPEC. I > attached an updated performance report that includes both compile > time and performance measurements. > I think that these number look good. Regarding the performance regressions: This looks like noise: MultiSource/Benchmarks/McCat/08-main/main 44.40% 0.0277 0.0400 0.0000 For these two: MultiSource/Benchmarks/Olden/bh/bh 19.73% 1.1547 1.3825 0.0017 MultiSource/Benchmarks/Bullet/bullet 7.30% 3.6130 3.8767 0.0069 can you run them on a different CPU and see how generic these slowdowns are? Thanks again, Hal > > Thanks, > Nadav > > > > On Jul 14, 2013, at 10:55 PM, Nadav Rotem < nrotem at apple.com > wrote: > > > > > > On Jul 14, 2013, at 9:52 PM, Chris Lattner < clattner at apple.com > > wrote: > > > > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem < nrotem at apple.com > wrote: > > > > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent > instructions in a straight-line code. It is currently not enabled by > default, and people who want to experiment with it can use the clang > command line flag “-fslp-vectorize”. I ran LLVM’s test suite with > and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o > AVX). Based on my performance measurements (below) I would like to > enable the SLP-vectorizer by default on -O3. I would like to hear > what others in the community think about this and give other people > the opportunity to perform their own performance measurements. > > This looks great Nadav. The performance wins are really big. How you > investigated the bh and bullet regression though? > > > Thanks. Yes, I looked at both. The hot function in BH is “gravsub”. > The vectorized IR looks fine and the assembly looks fine, but for > some reason Instruments reports that the first vector-subtract > instruction takes 18% of the time. The regression happens both with > the VEX prefix and without. I suspected that the problem is the > movupd's that load xmm0 and xmm1. I started looking at some > performance counters on Friday, but I did not find anything > suspicious yet. > > +0x00 movupd 16(%rsi), %xmm0 > +0x05 movupd 16(%rsp), %xmm1 > +0x0b subpd %xmm1, %xmm0 <———— 18% of the runtime of bh ? > +0x0f movapd %xmm0, %xmm2 > +0x13 mulsd %xmm2, %xmm2 > +0x17 xorpd %xmm1, %xmm1 > > +0x1b addsd %xmm2, %xmm1 > > > I spent less time on Bullet. Bullet also has one hot function > (“resolveSingleConstraintRowLowerLimit”). On this code the > vectorizer generates several trees that use the <3 x float> type. > This is risky because the loads/stores are inefficient, but > unfortunately triples of RGB and XYZ are very popular in some > domains and we do want to vectorize them. I skimmed through the IR > and the assembly and I did not see anything too bad. The next step > would be to do a binary search on the places where the vectorizer > fires to locate the bad pattern. > > > On AVX we have another regression that I did not mention: Flops-7. > When we vectorize we cause more spills because we do a poor job > scheduling non-destructive source instructions (related to PR10928). > Hopefully Andy’s scheduler will fix this regression once it is > enabled. > > > I did not measure code size, but I did measure compile time. There > are 4-5 workloads (not counting workloads that run below 0.5 > seconds) where the compile time increase is more than 5%. I am aware > of a problem in the (quadratic) code that looks for consecutive > stores. This code calls SCEV too many times. I plan to fix this. > > > Thanks, > Nadav > > > > > > > We should at least understand what is going wrong there. bh is pretty > tiny, so it should be straight-forward. It would also be really > useful to see what the code size and compile time impact is. > > -Chris > > > > > — Performance Gains — > SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.68% > MultiSource/Benchmarks/Olden/power/power -18.55% > MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt > -14.71% > SingleSource/Benchmarks/Misc/flops-6 -11.02% > SingleSource/Benchmarks/Misc/flops-5 -10.03% > MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt > -8.37% > External/Nurbs/nurbs -7.98% > SingleSource/Benchmarks/Misc/pi -7.29% > External/SPEC/CINT2000/252_eon/252_eon -5.78% > External/SPEC/CFP2006/444_namd/444_namd -4.52% > External/SPEC/CFP2000/188_ammp/188_ammp -4.45% > MultiSource/Applications/SIBsim4/SIBsim4 -3.58% > MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl > -3.52% > SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.96% > MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl > -2.75% > MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.70% > MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl > -1.95% > SingleSource/Benchmarks/Misc/flops -1.89% > SingleSource/Benchmarks/Misc/oourafft -1.71% > MultiSource/Benchmarks/mafft/pairlocalalign -1.16% > External/SPEC/CFP2006/447_dealII/447_dealII -1.06% > > — Regressions — > MultiSource/Benchmarks/Olden/bh/bh 22.47% > MultiSource/Benchmarks/Bullet/bullet 7.31% > SingleSource/Benchmarks/Misc-C++-EH/spirit 5.68% > SingleSource/Benchmarks/SmallPT/smallpt 3.91% > > Thanks, > Nadav > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From qcolombet at apple.com Tue Jul 23 16:55:18 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Tue, 23 Jul 2013 16:55:18 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: <7C305296-ADC0-4279-8DC2-A0339F14ADB8@apple.com> Hi David, Thanks for the feedbacks/clarifications, I appreciate. You will find my comments inlined with your mail. For a quick reference, here is the summary. ** Summary ** The other considered approach would provide some callbacks for event/information we think they are important. Like Chris pointed out, some of the information is not available for querying, thus the call back should provide sufficient information. The client of the callback could then query llvm if the callbacks does not provide sufficient information (and assuming more information would be available via querying). At the moment, I see the following use cases: - Optimization hints (see Hal’s idea). - Fatal error/warning reporting (this could be done via the first proposal: enum + bool + message.). - Stack size reporting. What else? Thoughts? Thanks again for all the feedbacks. Cheers, -Quentin On Jul 23, 2013, at 3:37 PM, David Blaikie wrote: > On Mon, Jul 22, 2013 at 4:17 PM, Quentin Colombet wrote: >> Hi, >> >> Compared to my previous email, I have added Hal’s idea for formatting the >> message and pull back some idea from the "querying framework”. >> Indeed, I propose to add some information in the reporting so that a >> front-end (more generally a client) can filter the diagnostics or take >> proper actions. >> See the details hereafter. >> >> On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: >> >> >> On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher >> wrote: >>> >>>>> This is pretty much the same as what Quentin proposed (with the >>>>> addition of the enum), isn't it? >>>>> >>>> >>>> Pretty close yeah. >>>> >>> >>> Another thought and alternate strategy for dealing with these sorts of >>> things: >>> >>> A much more broad set of callback machinery that allows the backend to >>> communicate values or other information back to the front end that can >>> then decide what to do. We can define an interface around this, but >>> instead of having the backend vending diagnostics we have the callback >>> take a "do something with this value" which can just be "communicate >>> it back to the front end" or a diagnostic callback can be passed down >>> from the front end, etc. >>> >>> This will probably take a bit more design to get a general framework >>> set up, but imagine the usefulness of say being able to automatically >>> reschedule a jitted function to a thread with a larger default stack >>> size if the callback states that the thread size was N+1 where N is >>> the size of the stack for a thread you've created. >> >> >> FWIW, *this* is what I was trying to get across. Not that it wouldn't be a >> callback-based mechanism, but that it should be a fully general mechanism >> rather than having something to do with warnings, errors, notes, etc. If a >> frontend chooses to use it to produce such diagnostics, cool, but there are >> other use cases that the same machinery should serve. >> >> >> I like the general idea. >> >> To be sure I understood the proposal, let me give an example. >> >> ** Example ** >> The compiler says here is the size of the stack for Loc via a “handler” >> (“handler" in the sense whatever mechanism we come up to make such >> communication possible). Then the front-end builds the diagnostic from that >> information (or query for more if needed) or drops everything if it does not >> care about this size for instance (either it does not care at all or the >> size is small enough compared to its setting). >> >> ** Comments ** >> Unless we have one handler per -kind of - use, and I would like to avoid >> that, > > I think that's, somewhat, Chandlers point (sorry, I hate to play > Telephone here - but hope to help clarify some positions... apologies > if this just further obfuscates/confuses). I believe the idea is some > kind of generic callback type with a bunch of no-op-default callbacks, > override the ones your frontend cares about ("void onStackSize(size_t > bytes)", etc...). I see. > > Yes, we could come up with a system that doesn't require adding a new > function call for every data that needs to be provided. Does it seem > likely we'll have so many of these that we'll really want/need that? That’s a good point. I guess there are use cases that we may not anticipate and it would be nice if we do not have to modify this interface for that. In particular, out-of-the-tree target may want to do fancy thing without to push for change in the public tree. For now, I believe this is great that many people contributed their ideas and use cases, we can then decide what we want and what we do not want to address in the near future. Anyway, I agree that we should focus on what we care the most when we are done with collecting the use cases (are we done?). > >> I think we should still provide an information on the severity of the >> thing we are reporting and what we are reporting. >> Basically: >> - Severity: Will the back-end abort after the information pass down or will >> it continue (the boolean of the previous proposal)? > > In the case of specific callbacks - that would be statically known > (you might have a callback for some particular problem (we ran out of > registers & can't satisfy this inline asm due to the register > allocation of this function) - it's the very contract that that > callback is a fatal problem). If we have a more generic callback > mechanism, yes - we could enshrine some general properties (such as > fatality) in the common part & leave the specifics of what kind of > fatal problem to the 'blob’. Agreed. > >> - Kind: What are we reporting (the enum from the previous proposal)? >> >> I also think we should be able to provide a default (formatted) message, >> such that a client that does not need to know what to do with the >> information can still print something somehow useful, especially on abort >> cases. > > Do you have some examples of fatal/abort cases we already have & how > they're reported today? (including what kind of textual description we > use?) Here are a few example of some warnings that you can find in the back end: - IntrinsicLowering: case Intrinsic::stackrestore: { if (!Warned) errs() << “WARNING: this target does not support the llvm.stack" << (Callee->getIntrinsicID() == Intrinsic::stacksave ? "save" : "restore") << " intrinsic.\n”; - PrologEpilogInserter: errs() << "warning: Stack size limit exceeded (" << MFI->getStackSize() << ") in " << Fn.getName() << ".\n”; Actually, when I wrote this, I had Hal’s Optimization Diary in mind: - "*This loop* cannot be optimized because the induction variable *i* is unsigned, and cannot be proved not to wrap" - "*This loop* cannot be vectorized because the compiler cannot prove that memory read from *a* does not overlap with memory written to through *b*" - "*This loop* cannot be vectorized because the compiler cannot prove that it is unconditionally safe to dereference the pointer *a*. - The message string is text but a single kind of markup is allowed: , for example: "We cannot vectorize because is an unfriendly variable" (where the first will be replaced by text derived from a DIScope and the second from a DIVariable). > >> Thus, it sounds a good thing to me to have a string with some markers to >> format the output plus the arguments to be used in the formatted output. >> Hal’s proposal could do the trick (although I do not know if DIDescriptor >> are the best thing to use here). >> >> ** Summary ** >> I am starting to think that we should be able to cover the reporting case >> plus some querying mechanism with something like: >> void reportSomehtingToFEHandler(enum Reporting Kind, bool IsAbort, > information>, const char* DefautMsg, > the defautMsg>) > > Personally I dislike losing type safety in this kind of API ("here's a > blob of data you must programmatically query based on a schema implied > by the 'Kind' parameter & some documentation you read"). I'd prefer > explicit callbacks per thing - if we're going to have to write an > explicit structure & document the parameters to each of these > callbacks anyway, it seems easier to document that by API. (for fatal > cases we could have no default implementations - this would ensure > clients would be required to update for new callbacks & not > accidentally suppress them) If we want to let people do everything they want without modifying the existing structure, I think we need both. I agree that for the cases we care the most, we could specialize that approach with a specific call back for each cases. Also note that the initial intend was to report error/warning, thus the is not required. My point is, if we do not rely on the front-end to take specific action, there is no need to pass this information. Therefore, we can eliminate that type safety problem. > >> Where is supposed to be the class/struct/pointer to the >> relevant information for this kind. If it is not enough the FE should call >> additional APIs to get what it wants. >> >> This looks similar to the “classical” back-end report to front-end approach, >> but gives more freedom to the front-end as it can choose what to do based on >> the attached information. >> I also believe this will reduce the need to expose back-end APIs and speed >> up the process. > > Speed up the process of adding these diagnostic, possibly at the cost > of having a more opaque/inscrutible API to data from LLVM, it seems. Good point and it matches what I have written above. Indeed, if we rely on the front-end to query for more information when it gets a reporting like this, we do not need to pass the information and we avoid this problem. > >> However, the ability of the front-end (or client) to query the back-end is >> limited to the places where the back-end is reporting something. Also, if >> the back-end is meant to abort, the FE cannot do anything about it (e.g., >> the stack size is not big enough for the jitted function). >> That’s why I said it cover “some" querying mechanism. >> >> ** Concerns ** >> 1. Testing. > > Testing - I assume we'd have opt/llc register for all these callbacks > & print them in some way (it doesn't need to have a "stack size is too > small warning" it just needs to print the stack size whenever it's > told - or maybe have some way to opt in to callback rendering) & then > check the behavior with FileCheck as usual (perhaps print this stuff > to stderr so it doesn't get confused with bytecode/asm under -o -). > > That tests LLVM's contract - that it called the notifications. > Testing Clang's behavior when these notifications are provided would > either require end-to-end testing (just having Clang tests that run > LLVM, assume it already passes the LLVM-only tests & then tests Clang > behavior on top of that) as we do in a few places already - or have > some kind of stub callback implementation we can point Clang to (it > could read a file of callbacks to call). That would be nice, but going > on past experience I don't suppose anyone would actually bother to > implement it. > >> >> Assuming we will always emit these reports, relying on a front-end to filter >> out what is not currently relevant (e.g., we did not set the stack size >> warning in the FE), what will happen when we test (make check) without a >> front-end? >> I am afraid we will pollute all tests or we will have some difficulty to >> test a specific reporting. >> >> 2. Regarding a completely query based approach, like Chris pointed out, I do >> not see how we can report consistent information at any given time. Also, >> Eric, coming back to your jit example, how could we prevent the back-end to >> abort if the jitted is too big for the stack? > > Eric's (originally Chandler's discussed in person) example wasn't > about aborting compilation. The idea was you JIT a function, you get a > callback saying "stack size 1 million bytes" and so you spin up a > thread that has a big stack to run this function you just compiled. Thanks for the clarification. > > The point of the example is that a pure warning-based callback is > 'general' for LLVM but very specific for LLVM /clients/ (LLVM as a > library, something we should keep in mind) - all they can do is print > it out. If we provide a more general feature for LLVM clients > (callbacks that notify those clients about things they might be > interested in, like the size of a function's stack) then they can > build other features (apart from just warning) such as a JIT that > dynamically chooses thread stack size based on the stack size of the > functions it JITs. I see your point. > >> >> 3. Back to the strictly reporting approach where we extend the inlineasm >> handler (the approach proposed by Bob and that I sketched a little bit >> more), now looks similar to this approach expect that the back-end chooses >> what it is relevant to report and the back-end does not need to pass down >> the information. >> The concern is how do we easily (in a robust and extendable manner) provide >> a front-end/back-end option for each warning/error? >> >> >> Thoughts? >> >> Cheers, >> >> -Quentin >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Tue Jul 23 17:00:41 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Wed, 24 Jul 2013 02:00:41 +0200 Subject: [LLVMdev] Insert new basic blocks Message-ID: Using Module Pass How could I insert new basic blocks such as while(i==1) {} in the IR and also change the predecessors and successors according to inserting these basic blocks. -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From pichet2000 at gmail.com Tue Jul 23 19:53:29 2013 From: pichet2000 at gmail.com (Francois Pichet) Date: Tue, 23 Jul 2013 22:53:29 -0400 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Tue, Jul 23, 2013 at 10:17 PM, Wang Qi wrote: > -1. > > I believe there are still a lot of people using VC 2008, though I can't > give the data. > VC 2008 (even the express version) is enough for a lot of development, > such like game development. > > Most likely I myself will continue using VC 2008 for at least two or more > years. > > ok but clang/llvm need to move forward too. VC 2008 is 5 years old now. MSVC express edition is free and can be used to develop LLVM/CLANG so there is no excuse ($$$ or license) to not update to a more recent edition. Personally I think we should support the current and the last previous version of MSVC. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Tue Jul 23 20:42:58 2013 From: tobias at grosser.es (Tobias Grosser) Date: Tue, 23 Jul 2013 20:42:58 -0700 Subject: [LLVMdev] Analysis of polly-detect overhead in oggenc In-Reply-To: References: <51E02B5C.3030208@grosser.es> <727cd06.251b.13fd9059546.Coremail.tanmx_star@yeah.net> <51E19CAF.4050500@grosser.es> <73bc0067.345f.13fdb666d45.Coremail.tanmx_star@yeah.net> <51E2352A.6090706@grosser.es> <7aa4b9a.3bcb.13fdc808c5f.Coremail.tanmx_star@yeah.net> <32e1ec0e.464a.13fddb6c5c7.Coremail.tanmx_star@yeah.net> <51E31F3E.3080501@grosser.es> <51EC1D0F.6000800@grosser.es> <1f586e46.3cca.1400439c171.Coremail.tanmx_star@yeah.net> <51ECB235.4060901@grosser.es> <8e12161.64f9.140058f158f.Coremail.tanmx_star@yeah.net> <51ED4164.9000008@grosser.es> <3ae8cb58.2cb9.1400a524281.Coremail.tanmx_star@yeah.net> <51EEA634.90109@grosser.es> Message-ID: <51EF4D42.5060804@grosser.es> On 07/23/2013 10:13 AM, James Courtier-Dutton wrote: > On 23 July 2013 16:50, Tobias Grosser wrote: > >> On 07/22/2013 11:58 PM, Star Tan wrote: >> >>> Hi Tobias, >>> >>> >>> I have attached a patch file to optimize string operations in >>> Polly-Detect pass. >>> In this patch file, I put most of long string operations in the condition >>> variable of "PollyViewMode" or in the DEBUG mode. >>> >> >> OK. Hi James, > Is there any way to make the debug messages themselves more efficient? Yes, there are two ways: 1) Use the getName() function instead of the ostream writer This is significantly faster than what we have today, but does not format unnamed instructions correctly (It does not create the %123 instruction namings). Even though this is fast, it is probably still slower than not doing any formatting at all. 2) Fix the AssemblyWriter as decribed by Daniel Berlin There seems to be a larger issue in the printing infrastructure. It does not seem to be written to print values many times. > Perhaps reimplementing it so that any table lookups are quick and no > malloc/free is done in the process. Citing Daniel: "The real fix is either to stop recreating these AssemblyWriter objects, or improve caching in the bowels of it so that it doesn't need to rerun typefinder again and again if nothing has changed." So there is something that needs to be fixed when using the ostream formatter as we do. However, I do not think we should do this in this patch. Doing any kind of debug message formatting in the normal pass is unnecessary overhead that we should not have at all. The change Star proposes, moves this overhead out of the hot path, such that for debug messages we can now prioritize understandable formatting over performance. Cheers, Tobias From sam at darkfunction.com Mon Jul 22 03:18:17 2013 From: sam at darkfunction.com (sam at darkfunction.com) Date: Mon, 22 Jul 2013 11:18:17 +0100 Subject: [LLVMdev] Libclang get class name from DeclRefExpr Message-ID: Hi guys, I am trying to extract the class name of a parameter to a method call in objective-C. The code I am parsing is: - (void)testAddConcreteDataModel:(DFDemoDataModelOne*)helpmeh { [self.dataModels addObject:helpmeh]; } And the result I need is the type of class of helpmeh, which is "DFDemoDataModelOne". So far I have the following code, which outputs: "[(DFDataModelContainer).dataModels addObject:helpmeh]" if (cursor.kind == CXCursor_ObjCMessageExpr) { __block NSString* memberName = nil; __block NSString* ownerClassName = nil; __block NSString* methodName = [NSString stringWithUTF8String:clang_getCString(clang_getCursorDisplayName(cursor))]; clang_visitChildrenWithBlock(cursor, ^enum CXChildVisitResult(CXCursor cursor, CXCursor parent) { if (cursor.kind == CXCursor_MemberRefExpr) { memberName = [NSString stringWithUTF8String:clang_getCString(clang_getCursorDisplayName(cursor))]; ownerClassName = [NSString stringWithUTF8String:clang_getCString(clang_getCursorDisplayName(clang_getCursorSemanticParent(clang_getCursorReferenced(cursor))))]; } else { if (memberName) { NSString* param = [NSString stringWithUTF8String:clang_getCString(clang_getCursorDisplayName(cursor))]; NSLog(@"[(%@).%@ %@%@]", ownerClassName, memberName, methodName, param); clang_visitChildrenWithBlock(cursor, ^enum CXChildVisitResult(CXCursor cursor, CXCursor parent) { // test if ([param isEqualToString:@"helpmeh"] && cursor.kind == CXCursor_DeclRefExpr) { // found the interesting part.. what now? } return CXChildVisit_Recurse; } } } return CXChildVisit_Continue; } } I'm just a bit lost as to how to extract information from cursors- when I AST dump my class I can see the information I need is all there (see the last line): |-ObjCMethodDecl 0x112790f90 - testAddConcreteDataModel: 'void' | |-ImplicitParamDecl 0x112791960 <> self 'DFDataModelContainer *const __strong' | |-ImplicitParamDecl 0x1127919c0 <> _cmd 'SEL':'SEL *' | |-ParmVarDecl 0x112791040 helpmeh 'DFDemoDataModelOne *__strong' | `-CompoundStmt 0x112791bf0 | `-ExprWithCleanups 0x112791bd8 'void' | `-ObjCMessageExpr 0x112791ba0 'void' selector=addObject: | |-PseudoObjectExpr 0x112791b48 'NSMutableArray *' | | |-ObjCPropertyRefExpr 0x112791ad0 '' lvalue objcproperty Kind=PropertyRef Property="dataModels" Messaging=Getter | | | `-OpaqueValueExpr 0x112791ab0 'DFDataModelContainer *' | | | `-ImplicitCastExpr 0x112791a40 'DFDataModelContainer *' | | | `-DeclRefExpr 0x112791a18 'DFDataModelContainer *const __strong' lvalue ImplicitParam 0x112791960 'self' 'DFDataModelContainer *const __strong' | | |-OpaqueValueExpr 0x112791ab0 'DFDataModelContainer *' | | | `-ImplicitCastExpr 0x112791a40 'DFDataModelContainer *' | | | `-DeclRefExpr 0x112791a18 'DFDataModelContainer *const __strong' lvalue ImplicitParam 0x112791960 'self' 'DFDataModelContainer *const __strong' | | `-ImplicitCastExpr 0x112791b30 'NSMutableArray *' | | `-ObjCMessageExpr 0x112791b00 'NSMutableArray *' selector=dataModels | | `-OpaqueValueExpr 0x112791ab0 'DFDataModelContainer *' | | `-ImplicitCastExpr 0x112791a40 'DFDataModelContainer *' | | `-DeclRefExpr 0x112791a18 'DFDataModelContainer *const __strong' lvalue ImplicitParam 0x112791960 'self' 'DFDataModelContainer *const __strong' | `-ImplicitCastExpr 0x112791b88 'id':'id' | `-ImplicitCastExpr 0x112791b70 'DFDemoDataModelOne *' | `-DeclRefExpr 0x112791a88 'DFDemoDataModelOne *__strong' lvalue ParmVar 0x112791040 'helpmeh' 'DFDemoDataModelOne *__strong' I have tried grabbing the semantic parent of the DeclRefExpr but the cursor kind is CXCursor_FirstInvalid. Now I'm stuck! Any help appreciated, Sam From wqking at outlook.com Tue Jul 23 19:17:02 2013 From: wqking at outlook.com (Wang Qi) Date: Wed, 24 Jul 2013 02:17:02 +0000 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: , , , Message-ID: -1. I believe there are still a lot of people using VC 2008, though I can't give the data. VC 2008 (even the express version) is enough for a lot of development, such like game development. Most likely I myself will continue using VC 2008 for at least two or more years. Unlike other C++ compiler, **Clang is not only a compiler, but also a great open source library**. An open source library should better keep lowest compiler/platform requirements, when possible. Also, seems Microsoft's development tools have quite long life and can spread into many years. For example, VC6 is not complete dead up to today... (some time ago I read on Reddit comment that somebody still needs to maintain a code base written with VC6). ---------------------------------------- > From: ahmed.bougacha at gmail.com > Date: Tue, 23 Jul 2013 10:47:42 -0700 > To: chandlerc at google.com > CC: cfe-dev at cs.uiuc.edu; llvmdev at cs.uiuc.edu > Subject: Re: [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? > > I (arduously) updated the docs, so VS 2008 support is gone now. > > I’m still curious about C++11 though! > > -- Ahmed > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev From chandlerc at google.com Tue Jul 23 21:12:15 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 23 Jul 2013 21:12:15 -0700 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Tue, Jul 23, 2013 at 7:17 PM, Wang Qi wrote: > -1. > > I believe there are still a lot of people using VC 2008, though I can't > give the data. > VC 2008 (even the express version) is enough for a lot of development, > such like game development. > > Most likely I myself will continue using VC 2008 for at least two or more > years. > > Unlike other C++ compiler, **Clang is not only a compiler, but also a > great open source library**. An open source library should better keep > lowest compiler/platform requirements, when possible. > > Also, seems Microsoft's development tools have quite long life and can > spread into many years. For example, VC6 is not complete dead up to > today... (some time ago I read on Reddit comment that somebody still needs > to maintain a code base written with VC6). > Unless someone within the community steps forward with a powerful argument to continue to support VC 2008, I'm going to make the call: we don't support it. Why? I don't actually disagree with your points, but I think there are overriding concerns: 1) The pragmatic fact is that we simply don't have enough contributors and active (within the community) users that use, exercise, report bugs, and provide patches to support VC2008 to credibly say we support it. The fact is that we already don't, and we won't going forward regardless of what folks say on this email thread. 2) #1 isn't a problem that it is worth it to the community to solve. That is, I would rather have the developers and members of the community working to better support more modern Windows platforms rather than this old one. So I think it is actively in our best interest to not invest in changing #1. 3) LLVM (and its subprojects) have a long history of beneficially tracking and leveraging modern aspects of C++. We want to do more of this faster, not less of it slower, because it significantly improves the cleanliness, maintainability, simplicity, and performance of our libraries. To this end, it is directly to the benefit of the project to stay as close as possible to the latest versions of the various toolchain vendors. 4) Users of LLVM that are necessarily dealing with an unchanging toolchain and environment always have the option of freezing their version of LLVM along with that environment, or working assiduously to build a sufficiently strong role within the community to both provide the necessary testing and fixes for the environment (#1 above) and overcome the burden it places on the rest of the project (#3). At this point, I suspect we should put the subject to rest. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noloader at gmail.com Tue Jul 23 23:21:13 2013 From: noloader at gmail.com (Jeffrey Walton) Date: Wed, 24 Jul 2013 02:21:13 -0400 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Tue, Jul 23, 2013 at 10:17 PM, Wang Qi wrote: > ... > I believe there are still a lot of people using VC 2008, though I can't give the data. > VC 2008 (even the express version) is enough for a lot of development, such like game development. > > Most likely I myself will continue using VC 2008 for at least two or more years. > > Unlike other C++ compiler, **Clang is not only a compiler, but also a great open source library**. An open source library should better keep lowest compiler/platform requirements, when possible. > > Also, seems Microsoft's development tools have quite long life and can spread into many years. For example, VC6 is not complete dead up to today... (some time ago I read on Reddit comment that somebody still needs to maintain a code base written with VC6). > I've got one project still using Visual Studio 2005. It has to be used because the resulting library was certified using a specific Microsoft runtime library. We can't change the platform because it would require a re-certification (the certification costs around 100K USD). As crazy as it sounds, we can install the software on a new version of Windows (Windows 7 or Windows 8), and the certification still applies. Jeff From qgyang at gmail.com Tue Jul 23 23:23:17 2013 From: qgyang at gmail.com (Qiao Yang) Date: Tue, 23 Jul 2013 23:23:17 -0700 Subject: [LLVMdev] Program compiled with Clang -pg and -O crashes with SEGFAULT Message-ID: Hi, I am trying to compile a simple program with Clang 3.3 on Linux and used -pg and -O2 option. The program would crash with segfault. Interestingly if I compile it with -pg option only it works. Do you have any idea why it crashes? And any workaround? $ cat myprog.c int main() { return 0; } $ clang -v -pg -O2 myprog.c clang version 3.3 (tags/RELEASE_33/final) Target: x86_64-pc-linux-gnu Thread model: posix "/usr/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name myprog.c -mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-linker-version 2.22 -momit-leaf-frame-pointer -v -resource-dir /usr/bin/../lib/clang/3.3 -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/clang/3.3/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -fdebug-compilation-dir /home/vagrant/work/c++ -ferror-limit 19 -fmessage-length 204 -pg -mstackrealign -fobjc-runtime=gcc -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o /tmp/myprog-oJBSKs.o -x c myprog.c clang -cc1 version 3.3 based upon LLVM 3.3 default target x86_64-pc-linux-gnu ignoring nonexistent directory "/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/bin/../lib/clang/3.3/include /usr/include/x86_64-linux-gnu /usr/include End of search list. "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/gcrt1.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. -L/lib -L/usr/lib /tmp/myprog-oJBSKs.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o Running this program it would give me stack trace: Program received signal SIGSEGV, Segmentation fault. mcount () at ../sysdeps/x86_64/_mcount.S:46 46 ../sysdeps/x86_64/_mcount.S: No such file or directory. (gdb) bt #0 mcount () at ../sysdeps/x86_64/_mcount.S:46 #1 0x00007ffff7dd6568 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0000000000000000 in ?? () If I compare the assembly code with and without optimization, the only difference seems to be that the optimized version removed some register preservation ops before calling mcount(). -pg version, which works: ---------------------------------------- main: # @main .cfi_startproc # BB#0: pushq %rbp .Ltmp2: .cfi_def_cfa_offset 16 .Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp4: .cfi_def_cfa_register %rbp subq $16, %rsp callq mcount movl $0, %eax movl $0, -4(%rbp) addq $16, %rsp popq %rbp ret -pg -O2 version, which crashes: -------------------------------------------- main: # @main .cfi_startproc # BB#0: pushq %rax .Ltmp1: .cfi_def_cfa_offset 16 callq mcount xorl %eax, %eax popq %rdx ret For those who are familiar with assembly code, does it ring some bell? --Qiao From t.p.northover at gmail.com Tue Jul 23 23:37:46 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 24 Jul 2013 07:37:46 +0100 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: Hi Rasha, On Wed, Jul 24, 2013 at 12:28 AM, Rasha Omar wrote: > 1- I need the first example. Oh good. > 2- I set the Address uninitialized according to the documentation > " Setting the name on the Value automatically updates the module's symbol > table" from Value.h source code That's referring to a string name, and is only really important for clarity and debugging the IR being produced. It means you could start out with something like [...] %1 = add i32 %lhs, %rhs ret i32 %1 [...] (where the %1 is just an automatically incrementing label provided by LLVM) then call MyAddInst.setName("theSum") and LLVM would automatically convert this to: %theSum = add i32 %lhs, %rhs ret i32 %theSum You still have to have an initialised, valid Value pointer to be able to do this. > 3- I'm not sure about "select" instruction, you mean that the address is the > new destination (basic block)that will be added The address will be, directly or indirectly, the result of a "BlockAddress::get(...)" call. Perhaps directly (though that would be rather useless since then you'd just as well create a direct branch to that block, perhaps via a "select" as in my code, or via load/store. Perhaps even passed into the function as a parameter in a rather bizarre set of circumstances. Do you know about the Cpp backend, by the way? It can be very useful for working out just what you have to write to emit certain LLVM IR. What you do is write your own .ll file by hand, having the features you want, then run $ llc -march=cpp example_file.ll -o - LLVM will produce some C++ code that generates the module you wrote. It's not necessarily in the best of styles, but shows you roughly which calls you should be making. Cheers. Tim. From doob at me.com Tue Jul 23 23:49:17 2013 From: doob at me.com (Jacob Carlborg) Date: Wed, 24 Jul 2013 08:49:17 +0200 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: <51EEF351.9030109@mips.com> References: <51EEF351.9030109@mips.com> Message-ID: On 2013-07-23 23:19, reed kotler wrote: > I have a new 11" mac air with lots of disk and ram. > > I'm trying to figure out the best way to configure it for llvm > development on ubuntu. > > Bootcamp or Virtual Box (or other Virtualization). > Bootcamp seems like it would be a nuisance. > > This is just not my main development machine but it's very portable. :) Do your LLVM development on Mac OS X :) It depends on what your needs are. Using VirtualBox will probably be the easiest. It also allows you to run both Mac OS X and Ubuntu simultaneously. The downside is that it will be slower than running Ubuntu natively. I say, try running in VirtualBox first. -- /Jacob Carlborg From tghardin1 at catamount.wcu.edu Wed Jul 24 00:47:47 2013 From: tghardin1 at catamount.wcu.edu (Tyler Hardin) Date: Wed, 24 Jul 2013 03:47:47 -0400 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: References: <51EEF351.9030109@mips.com> Message-ID: On Jul 24, 2013 2:52 AM, "Jacob Carlborg" wrote: > > Do your LLVM development on Mac OS X :) Should work well. Apple is one of the bigger supporters of LLVM, so I'd hope OS X would be a suitable dev platform. > It depends on what your needs are. Using VirtualBox will probably be the easiest. It also allows you to run both Mac OS X and Ubuntu simultaneously. The downside is that it will be slower than running Ubuntu natively. > > I say, try running in VirtualBox first. Not much slower. VBox does an amazing job at getting near native performance on modern machines (those with nested paging etc.). This is definitely the best option if your computer has ~2g ram and 2+ cores. Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. Also, look into seamless mode. It lets you use windows opened in the VM in the host OS. That sounds vague. Just Google it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Chisnall at cl.cam.ac.uk Wed Jul 24 01:03:08 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Wed, 24 Jul 2013 09:03:08 +0100 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: References: <51EEF351.9030109@mips.com> Message-ID: <73A96E7A-A937-47E4-B421-EE45D7588A89@cl.cam.ac.uk> On 24 Jul 2013, at 08:47, Tyler Hardin wrote: > Not much slower. VBox does an amazing job at getting near native performance on modern machines (those with nested paging etc.). This is definitely the best option if your computer has ~2g ram and 2+ cores. Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. I use VirtualBox (hosting a FreeBSD VM) on a MacBook Pro for most of my LLVM development. > Also, look into seamless mode. It lets you use windows opened in the VM in the host OS. That sounds vague. Just Google it. Or set up a loopback network adaptor and ssh in. Most of the reason I use OS X is that their Terminal.app doesn't suck nearly as much as the other terminal emulators I've used. It happily persists directory state, so I have a little script that creates a temporary directory for each ssh session storing a tmux id and so I can resume terminal sessions across host machine reboots, without the windows moving. David From annulen at yandex.ru Wed Jul 24 02:14:03 2013 From: annulen at yandex.ru (Konstantin Tokarev) Date: Wed, 24 Jul 2013 13:14:03 +0400 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: <73A96E7A-A937-47E4-B421-EE45D7588A89@cl.cam.ac.uk> References: <51EEF351.9030109@mips.com> <73A96E7A-A937-47E4-B421-EE45D7588A89@cl.cam.ac.uk> Message-ID: <827781374657243@web26f.yandex.ru> 24.07.2013, 12:31, "David Chisnall" : > On 24 Jul 2013, at 08:47, Tyler Hardin wrote: > >>  Not much slower. VBox does an amazing job at getting near native performance on modern machines (those with nested paging etc.). This is definitely the best option if your computer has ~2g ram and 2+ cores. Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. > > I use VirtualBox (hosting a FreeBSD VM) on a MacBook Pro for most of my LLVM development. > >>  Also, look into seamless mode. It lets you use windows opened in the VM in the host OS. That sounds vague. Just Google it. > > Or set up a loopback network adaptor and ssh in.  Most of the reason I use OS X is that their Terminal.app doesn't suck nearly as much as the other terminal emulators I've used.  It happily persists directory state, so I have a little script that creates a temporary directory for each ssh session storing a tmux id and so I can resume terminal sessions across host machine reboots, without the windows moving. Try iTerm2 with tmux integration. -- Regards, Konstantin From David.Chisnall at cl.cam.ac.uk Wed Jul 24 02:38:29 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Wed, 24 Jul 2013 10:38:29 +0100 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: <827781374657243@web26f.yandex.ru> References: <51EEF351.9030109@mips.com> <73A96E7A-A937-47E4-B421-EE45D7588A89@cl.cam.ac.uk> <827781374657243@web26f.yandex.ru> Message-ID: <5495B317-9FC6-45D4-8628-B57019C8F931@cl.cam.ac.uk> On 24 Jul 2013, at 10:14, Konstantin Tokarev wrote: > Try iTerm2 with tmux integration. Off topic for this list, but I've tried iTerm and found a great many UI issues with it, not to mention its propensity for crashing, make it far from useable for daily operation. Oh, and it only appeared to manage local tmux sessions, so is useless for my use-case where I only care about remote ones. David From jay.foad at gmail.com Wed Jul 24 04:03:15 2013 From: jay.foad at gmail.com (Jay Foad) Date: Wed, 24 Jul 2013 12:03:15 +0100 Subject: [LLVMdev] uitofp and sitofp rounding mode Message-ID: When the uitofp and sitofp instructions convert e.g. from i64 to float, what rounding mode do they use? Answers in the form a patch to LangRef.html would be great! Thanks, Jay. From justin.holewinski at gmail.com Wed Jul 24 05:18:58 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Wed, 24 Jul 2013 08:18:58 -0400 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: <51EEF351.9030109@mips.com> References: <51EEF351.9030109@mips.com> Message-ID: I've used both Ubuntu natively and Arch on a VirtualBox VM on a 2011 MBA. Unless you need native-speed GPU acceleration, I would recommend the VirtualBox route. You have to jump through a few hoops to get Ubuntu installed natively (install rEFIt, partition the drive), and I've read Ubuntu still has some issues with the 2013 MBAs [1]. Arch (and Ubuntu) in a VirtualBox VM has worked mostly flawlessly for me and are very easy to set up. Battery life takes a hit, but its not outrageous. The biggest problem I've had is managing disk space on the puny 128GB SSD. [1] http://www.phoronix.com/scan.php?page=article&item=apple_mba2013_ubuntu&num=1 On Tue, Jul 23, 2013 at 5:19 PM, reed kotler wrote: > I have a new 11" mac air with lots of disk and ram. > > I'm trying to figure out the best way to configure it for llvm development > on ubuntu. > > Bootcamp or Virtual Box (or other Virtualization). > Bootcamp seems like it would be a nuisance. > > This is just not my main development machine but it's very portable. :) > > > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyhu at apple.com Wed Jul 24 10:11:39 2013 From: jeremyhu at apple.com (Jeremy Huddleston Sequoia) Date: Wed, 24 Jul 2013 10:11:39 -0700 Subject: [LLVMdev] Transitioning build to cmake Message-ID: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> I recently took a stab at changing the MacPorts llvm-3.4 port from the configure-based build system to the cmake-based build system. There are a couple of issues that I still haven't been able to work out yet and would like to know if these are just configuration issues on my side or bugs I should file at bugs.llvm.org: 1) libclang_rt It looks like the cmake build is missing some runtime libraries. How do I coerce it into building the right ones? -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.10.4.a ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.asan_osx_dynamic.dylib -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.cc_kext.a -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.eprintf.a -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.ios.a -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.osx.a -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.profile_ios.a +./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.i386.a ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.profile_osx.a ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.ubsan_osx.a +./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.x86_64.a 2) Universal This looks like a bug in cmake. I'm unable to build universal when setting CMAKE_OSX_ARCHITECTURES. cmake errors out because it only set the -arch command line arguments at link time: ... Building C object CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o /usr/bin/clang -Os -o CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o -c /opt/local/var/macports/build/_Users_jeremy_src_macports_trunk_dports_lang_llvm-3.4/llvm-3.4/work/build/CMakeFiles/CMakeTmp/testCCompiler.c Linking C executable cmTryCompileExec3905760613 /opt/local/bin/cmake -E cmake_link_script CMakeFiles/cmTryCompileExec3905760613.dir/link.txt --verbose=1 /usr/bin/clang -Os -Wl,-search_paths_first -Wl,-headerpad_max_install_names -L/opt/local/lib -Wl,-headerpad_max_install_names -arch x86_64 -arch i386 CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o -o cmTryCompileExec3905760613 ld: warning: ignoring file CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o, file was built for unsupported file format ( 0xCF 0xFA 0xED 0xFE 0x07 0x00 0x00 0x01 0x03 0x00 0x00 0x00 0x01 0x00 0x00 0x00 ) which is not the architecture being linked (i386): CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o Undefined symbols for architecture i386: ... 3) Shared library The build fails if I try to build llvm using BUILD_SHARED_LIBS=ON ... the issue is that when the build tries to use the tools, dyld can't find libllvm-3.4svn.dylib because it's not yet installed. 4) Building clang using installed llvm It looks like there is some support for building clang against an installed llvm by setting CLANG_PATH_TO_LLVM_BUILD. This fails miserably in part because the installed llvm cmake files reference build time paths, but even after fixing that, there are tons of build failures. I'm guessing this is still a work in progress, but if I should file bugs, please let me know. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4136 bytes Desc: not available URL: From baldrick at free.fr Wed Jul 24 10:51:32 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 24 Jul 2013 19:51:32 +0200 Subject: [LLVMdev] Program compiled with Clang -pg and -O crashes with SEGFAULT In-Reply-To: References: Message-ID: <51F01424.6010908@free.fr> Hi Qiao, On 24/07/13 08:23, Qiao Yang wrote: > Hi, > > I am trying to compile a simple program with Clang 3.3 on Linux and used -pg and -O2 option. The program would crash with segfault. Interestingly if I compile it with -pg option only it works. Do you have any idea why it crashes? And any workaround? > > $ cat myprog.c > int main() { > return 0; > } > > $ clang -v -pg -O2 myprog.c if you compile with -fno-omit-frame-pointer, the crash goes away, right? > clang version 3.3 (tags/RELEASE_33/final) > Target: x86_64-pc-linux-gnu > Thread model: posix > "/usr/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name myprog.c -mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-linker-version 2.22 -momit-leaf-frame-pointer -v -resource-dir /usr/bin/../lib/clang/3.3 -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/clang/3.3/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -fdebug-compilation-dir /home/vagrant/work/c++ -ferror-limit 19 -fmessage-length 204 -pg -mstackrealign -fobjc-runtime=gcc -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o /tmp/myprog-oJBSKs.o -x c myprog.c > clang -cc1 version 3.3 based upon LLVM 3.3 default target x86_64-pc-linux-gnu > ignoring nonexistent directory "/include" > #include "..." search starts here: > #include <...> search starts here: > /usr/local/include > /usr/bin/../lib/clang/3.3/include > /usr/include/x86_64-linux-gnu > /usr/include > End of search list. > "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/gcrt1.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. -L/lib -L/usr/lib /tmp/myprog-oJBSKs.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o > > Running this program it would give me stack trace: > Program received signal SIGSEGV, Segmentation fault. > mcount () at ../sysdeps/x86_64/_mcount.S:46 > 46 ../sysdeps/x86_64/_mcount.S: No such file or directory. > (gdb) bt > #0 mcount () at ../sysdeps/x86_64/_mcount.S:46 > #1 0x00007ffff7dd6568 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x0000000000000000 in ?? () Here you see that it crashes inside mcount. I think mcount walks up the call stack in order to generate a call graph. Since it crashes when there is no frame pointer it seems like this is an mcount bug, and it should really be updated to one of the more modern methods that doesn't require a frame pointer. > > If I compare the assembly code with and without optimization, the only difference seems to be that the optimized version removed some register preservation ops before calling mcount(). > > -pg version, which works: > ---------------------------------------- > main: # @main > .cfi_startproc > # BB#0: > pushq %rbp ^ frame pointer > .Ltmp2: > .cfi_def_cfa_offset 16 > .Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > .Ltmp4: > .cfi_def_cfa_register %rbp > subq $16, %rsp > callq mcount > movl $0, %eax > movl $0, -4(%rbp) > addq $16, %rsp > popq %rbp > ret > > -pg -O2 version, which crashes: > -------------------------------------------- > main: # @main > .cfi_startproc > # BB#0: > pushq %rax > .Ltmp1: > .cfi_def_cfa_offset 16 ^ no frame pointer Ciao, Duncan. > callq mcount > xorl %eax, %eax > popq %rdx > ret > > For those who are familiar with assembly code, does it ring some bell? > > --Qiao > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From qgyang at gmail.com Wed Jul 24 11:16:11 2013 From: qgyang at gmail.com (Qiao Yang) Date: Wed, 24 Jul 2013 11:16:11 -0700 Subject: [LLVMdev] Program compiled with Clang -pg and -O crashes with SEGFAULT In-Reply-To: <51F01424.6010908@free.fr> References: <51F01424.6010908@free.fr> Message-ID: <41663217-E260-4A99-984B-183F55C8F961@gmail.com> Duncan, Yes, that is exactly the issue. It no longer crashes with -fno-omit-frame-pointer. Thanks a lot. --Qiao On Jul 24, 2013, at 10:51 AM, Duncan Sands wrote: > Hi Qiao, > > On 24/07/13 08:23, Qiao Yang wrote: >> Hi, >> >> I am trying to compile a simple program with Clang 3.3 on Linux and used -pg and -O2 option. The program would crash with segfault. Interestingly if I compile it with -pg option only it works. Do you have any idea why it crashes? And any workaround? >> >> $ cat myprog.c >> int main() { >> return 0; >> } >> >> $ clang -v -pg -O2 myprog.c > > if you compile with -fno-omit-frame-pointer, the crash goes away, right? > >> clang version 3.3 (tags/RELEASE_33/final) >> Target: x86_64-pc-linux-gnu >> Thread model: posix >> "/usr/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name myprog.c -mrelocation-model static -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-linker-version 2.22 -momit-leaf-frame-pointer -v -resource-dir /usr/bin/../lib/clang/3.3 -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/clang/3.3/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -fdebug-compilation-dir /home/vagrant/work/c++ -ferror-limit 19 -fmessage-length 204 -pg -mstackrealign -fobjc-runtime=gcc -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o /tmp/myprog-oJBSKs.o -x c myprog.c >> clang -cc1 version 3.3 based upon LLVM 3.3 default target x86_64-pc-linux-gnu >> ignoring nonexistent directory "/include" >> #include "..." search starts here: >> #include <...> search starts here: >> /usr/local/include >> /usr/bin/../lib/clang/3.3/include >> /usr/include/x86_64-linux-gnu >> /usr/include >> End of search list. >> "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/gcrt1.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. -L/lib -L/usr/lib /tmp/myprog-oJBSKs.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o >> >> Running this program it would give me stack trace: >> Program received signal SIGSEGV, Segmentation fault. >> mcount () at ../sysdeps/x86_64/_mcount.S:46 >> 46 ../sysdeps/x86_64/_mcount.S: No such file or directory. >> (gdb) bt >> #0 mcount () at ../sysdeps/x86_64/_mcount.S:46 >> #1 0x00007ffff7dd6568 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #2 0x0000000000000000 in ?? () > > Here you see that it crashes inside mcount. I think mcount walks up the call > stack in order to generate a call graph. Since it crashes when there is no > frame pointer it seems like this is an mcount bug, and it should really be > updated to one of the more modern methods that doesn't require a frame pointer. > >> >> If I compare the assembly code with and without optimization, the only difference seems to be that the optimized version removed some register preservation ops before calling mcount(). >> >> -pg version, which works: >> ---------------------------------------- >> main: # @main >> .cfi_startproc >> # BB#0: >> pushq %rbp > > ^ frame pointer > >> .Ltmp2: >> .cfi_def_cfa_offset 16 >> .Ltmp3: >> .cfi_offset %rbp, -16 >> movq %rsp, %rbp >> .Ltmp4: >> .cfi_def_cfa_register %rbp >> subq $16, %rsp >> callq mcount >> movl $0, %eax >> movl $0, -4(%rbp) >> addq $16, %rsp >> popq %rbp >> ret >> >> -pg -O2 version, which crashes: >> -------------------------------------------- >> main: # @main >> .cfi_startproc >> # BB#0: >> pushq %rax >> .Ltmp1: >> .cfi_def_cfa_offset 16 > > ^ no frame pointer > > Ciao, Duncan. > >> callq mcount >> xorl %eax, %eax >> popq %rdx >> ret >> >> For those who are familiar with assembly code, does it ring some bell? >> >> --Qiao >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eli.friedman at gmail.com Wed Jul 24 12:00:25 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 24 Jul 2013 12:00:25 -0700 Subject: [LLVMdev] uitofp and sitofp rounding mode In-Reply-To: References: Message-ID: On Wed, Jul 24, 2013 at 4:03 AM, Jay Foad wrote: > When the uitofp and sitofp instructions convert e.g. from i64 to > float, what rounding mode do they use? Answers in the form a patch to > LangRef.html would be great! The default rounding mode, just like every other floating-point operation. -Eli From resistor at mac.com Wed Jul 24 12:05:21 2013 From: resistor at mac.com (Owen Anderson) Date: Wed, 24 Jul 2013 12:05:21 -0700 Subject: [LLVMdev] uitofp and sitofp rounding mode In-Reply-To: References: Message-ID: <69F77C9D-384D-4A05-A796-5BAE76275754@mac.com> On Jul 24, 2013, at 12:00 PM, Eli Friedman wrote: > On Wed, Jul 24, 2013 at 4:03 AM, Jay Foad wrote: >> When the uitofp and sitofp instructions convert e.g. from i64 to >> float, what rounding mode do they use? Answers in the form a patch to >> LangRef.html would be great! > > The default rounding mode, just like every other floating-point operation. Except, of course, that the default rounding mode for an FP->integer conversion is different than the default rounding mode on, say, an FADD. FP->integer conversions are round-to-zero, while FP arithmetic operations are round-nearest-ties-to-even. --Owen From chisophugis at gmail.com Wed Jul 24 12:40:59 2013 From: chisophugis at gmail.com (Sean Silva) Date: Wed, 24 Jul 2013 12:40:59 -0700 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: On Wed, Jul 24, 2013 at 10:11 AM, Jeremy Huddleston Sequoia < jeremyhu at apple.com> wrote: > I recently took a stab at changing the MacPorts llvm-3.4 port from the > configure-based build system to the cmake-based build system. > Thanks for working on this! > 4) Building clang using installed llvm > > It looks like there is some support for building clang against an > installed llvm by setting CLANG_PATH_TO_LLVM_BUILD. This fails miserably > in part because the installed llvm cmake files reference build time paths, > but even after fixing that, there are tons of build failures. I'm guessing > this is still a work in progress, but if I should file bugs, please let me > know. This is probably not a very good idea because clang evolves in lock-step with LLVM. Unless the installed LLVM is the same revision as the clang you are building, things are likely to not work due to internal API changes. The option you cite is more likely intended for when you build clang in a directory separate from LLVM (rather than when it is in llvm/tools/clang/, where things just work) but both are still checked out at the same revision. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyhu at apple.com Wed Jul 24 13:03:20 2013 From: jeremyhu at apple.com (Jeremy Huddleston Sequoia) Date: Wed, 24 Jul 2013 13:03:20 -0700 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: Sent from my iPhone... > On Jul 24, 2013, at 12:40, Sean Silva wrote: > > >> On Wed, Jul 24, 2013 at 10:11 AM, Jeremy Huddleston Sequoia wrote: >> I recently took a stab at changing the MacPorts llvm-3.4 port from the configure-based build system to the cmake-based build system. > > Thanks for working on this! No prob. Hopefully we can iron out these wrinkles. >> 4) Building clang using installed llvm >> >> It looks like there is some support for building clang against an installed llvm by setting CLANG_PATH_TO_LLVM_BUILD. This fails miserably in part because the installed llvm cmake files reference build time paths, but even after fixing that, there are tons of build failures. I'm guessing this is still a work in progress, but if I should file bugs, please let me know. > > This is probably not a very good idea because clang evolves in lock-step with LLVM. Unless the installed LLVM is the same revision as the clang you are building, They match revision. We just need to split the build into two subports to break dependency cycles, and I'm hoping to avoid rebuilding the core libraries a second time. > things are likely to not work due to internal API changes. The option you cite is more likely intended for when you build clang in a directory separate from LLVM (rather than when it is in llvm/tools/clang/, where things just work) but both are still checked out at the same revision. > > -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From chisophugis at gmail.com Wed Jul 24 13:07:32 2013 From: chisophugis at gmail.com (Sean Silva) Date: Wed, 24 Jul 2013 13:07:32 -0700 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: On Wed, Jul 24, 2013 at 1:03 PM, Jeremy Huddleston Sequoia < jeremyhu at apple.com> wrote: > 4) Building clang using installed llvm >> >> It looks like there is some support for building clang against an >> installed llvm by setting CLANG_PATH_TO_LLVM_BUILD. This fails miserably >> in part because the installed llvm cmake files reference build time paths, >> but even after fixing that, there are tons of build failures. I'm guessing >> this is still a work in progress, but if I should file bugs, please let me >> know. > > > This is probably not a very good idea because clang evolves in lock-step > with LLVM. Unless the installed LLVM is the same revision as the clang you > are building, > > > They match revision. We just need to split the build into two subports to > break dependency cycles, and I'm hoping to avoid rebuilding the core > libraries a second time. > > Ah, OK. Definitely a bug in our build system then. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrea_DiBiagio at sn.scee.net Wed Jul 24 13:12:59 2013 From: Andrea_DiBiagio at sn.scee.net (Andrea_DiBiagio at sn.scee.net) Date: Wed, 24 Jul 2013 21:12:59 +0100 Subject: [LLVMdev] [cfe-dev] [RFC] add Function Attribute to disable optimization In-Reply-To: References: <51BF7516.4070800@mxc.ca> Message-ID: > From: Sean Silva > Here are the problems found: > 1) It is not safe to disable some transform passes in the backend. > It looks like there are some unwritten dependences between passes and > disrupting the sequence of passes to run may result in unexpected crashes > and/or assertion failures; > > This sounds like a bug. It's probably worth bringing up as its own > discussion on llvmdev if it is extremely prevalent, or file PR's (or > send patches fixing it!) if it is just a few isolated cases. It looks like the problem was in my code and it was caused by the lack of update on the set of preserved analysis; as soon as the problem was fixed, all the unexpected failures disappeared. > 2) The fact that pass managers are not currently designed to support > per-function optimization makes it difficult to find a reasonable way to > implement this new feature. > > About point 2. the Idea that came in my mind consisted in making passes > aware of the 'noopt' attribute. > In my experiment: > - I added a virtual method called 'mustAlwaysRun' in class Pass that > 'returns true if it is not safe to disable this pass'. > If a pass does not override the default implementation of that method, > then by default it will always return true (i.e. the pass "must > always run" pass even when attribute 'noopt' is specified). > - I then redefined in override that method on all the optimization passes > that could have been safely turned off when attribute noopt was present. > In my experiment, I specifically didn't disable Module Passes; > - Then I modified the 'doInitialize()' 'run*()' and 'doFinalize' methods > in Pass Manger to check for both the presence of attribute noopt AND the > value returned by method 'mustAlwaysRun' called on the current pass > instance. > > That experiment seemed to "work" on a few tests and benchmarks. > However: > a) 'noopt' wouldn't really imply no optimization, since not all codegen > optimization passes can be safely disabled. As a result, the assembly > produced for noopt functions had few differences with respect to the > assembly generated for the same functions at -O0; > b) I don't particularly like the idea of making passes "aware" of the > 'noopt' attribute. However, I don't know if there is a reasonable way to > implement the noopt attribute without having to re-design how pass > managers work. > > A redesign of the pass manager has been on the table for a while and > the need seems more pressing daily. Definitely make sure that this > use case is brought up in any design discussions for the new pass manager. > > -- Sean Silva Sure, I will definitely bring up this as soon as there will be a discussion about the pass manager. As I mentioned in my previous post, there could be a way to implement the 'noopt' attribute without having to redesign the pass manager. However it would require that we explicitly mark function passes that are expected to not run on a function marked as 'noopt'. That would guide the pass manager when selecting which passes to initialize/run/finalize on a 'noopt' function. What I don't like about that approach is the fact that passes would know about the noopt flag. Not only it would require changes in most of the already existent passes, but also it would force people who develop optimization passes to know about the existence of 'noopt' when they define their pass.. My questions are: would it be reasonable/acceptable at this point to implement this proposal before the pass manager is redesigned? If so, a solution like the one described above would be acceptable? Or are there better solutions? Any feedback (either good or bad) would be really appreciated at this point. Thanks, Andrea Di Biagio SN Systems - Sony Computer Entertainment Group ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses. Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793 ********************************************************************** P Please consider the environment before printing this e-mail From ctc.liu at gmail.com Wed Jul 24 12:17:29 2013 From: ctc.liu at gmail.com (Chenyang Liu) Date: Wed, 24 Jul 2013 12:17:29 -0700 Subject: [LLVMdev] Pre-RA scheduler details Message-ID: Hi, I'm interested in the two pre-RA instruction schedulers used in LLVM, list-hybrid and list-ilp. I've done some digging on the internet and played around with executing some test files using the two schedules. However, I'm still uncertain of the behaviors and heuristics used in each. For example, the XXXX_ls_rr_sort::isReady for hybrid includes a 3 cycle readydelay (seems arbitrary) whereas the ilp version readies all instructions once data dependencies are resolved. Additionally, the ilp_ls_rr_sort::operator() just calls BURRSort at the end after applying a bunch of heuristics to the queue beforehand. Reading it is quite laborious (very few comments) and was wondering if anyone had any references to what exactly it is doing and why it is doing it in such order. I'm focused right now on understanding the behavior of the two schedulers. If anyone could give me some direction (papers, slides) that would be greatly appreciated. Chenyang Liu -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Wed Jul 24 14:10:10 2013 From: echristo at gmail.com (Eric Christopher) Date: Wed, 24 Jul 2013 14:10:10 -0700 Subject: [LLVMdev] Deprecating and removing the MBlaze backend Message-ID: Doesn't seem to get a lot of love since most of the commits in the last 3 years have been maintenance. I guess it doesn't take a whole lot of maintenance either, but... cc'ing Wesley since he seems to be the last guy to commit to it. Thoughts? -eric From cdavis5x at gmail.com Wed Jul 24 14:18:55 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Wed, 24 Jul 2013 15:18:55 -0600 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: On Jul 24, 2013, at 11:11 AM, Jeremy Huddleston Sequoia wrote: > I recently took a stab at changing the MacPorts llvm-3.4 port from the configure-based build system to the cmake-based build system. > > There are a couple of issues that I still haven't been able to work out yet and would like to know if these are just configuration issues on my side or bugs I should file at bugs.llvm.org: > > tl;dr: LLVM CMake support is primarily designed for Windows/Visual Studio (especially in LLVM proper) and Linux (especially in compiler-rt), and needs lots of work to work well on Mac OS X or anywhere else. In particular, it is missing many features that are present in the autotools build. (Though, as the CMake proponents 'round here are quick to point out, the autotools system is itself missing some features that are present in CMake.) > 1) libclang_rt > > It looks like the cmake build is missing some runtime libraries. How do I coerce it into building the right ones? > > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.10.4.a > ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.asan_osx_dynamic.dylib > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.cc_kext.a > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.eprintf.a > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.ios.a > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.osx.a > -./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.profile_ios.a > +./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.i386.a > ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.profile_osx.a > ./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.ubsan_osx.a > +./opt/local/libexec/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.x86_64.a The CMake support in compiler-rt evolved in a completely different direction from the Makefiles; it was primarily designed originally, IIRC, to support building asan (and later, the other sanitizers), and mostly on Linux at that. Other platforms and configurations were an afterthought. It needs serious work--in particular, the runtime libraries are built with the host compiler (not usually a problem, unless you're building a cross compiler), and (as you've probably noticed by now) it doesn't make fat archives. Patches welcome if you can speak CMake ;). > > > 2) Universal > > This looks like a bug in cmake. I'm unable to build universal when setting CMAKE_OSX_ARCHITECTURES. cmake errors out because it only set the -arch command line arguments at link time: > > ... > Building C object > CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o > > /usr/bin/clang -Os -o > CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o -c > /opt/local/var/macports/build/_Users_jeremy_src_macports_trunk_dports_lang_llvm-3.4/llvm-3.4/work/build/CMakeFiles/CMakeTmp/testCCompiler.c > > > Linking C executable cmTryCompileExec3905760613 > > /opt/local/bin/cmake -E cmake_link_script > CMakeFiles/cmTryCompileExec3905760613.dir/link.txt --verbose=1 > > /usr/bin/clang -Os -Wl,-search_paths_first -Wl,-headerpad_max_install_names > -L/opt/local/lib -Wl,-headerpad_max_install_names -arch x86_64 -arch i386 > CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o -o > cmTryCompileExec3905760613 > > ld: warning: ignoring file > CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o, file was built > for unsupported file format ( 0xCF 0xFA 0xED 0xFE 0x07 0x00 0x00 0x01 0x03 > 0x00 0x00 0x00 0x01 0x00 0x00 0x00 ) which is not the architecture being > linked (i386): CMakeFiles/cmTryCompileExec3905760613.dir/testCCompiler.c.o > > Undefined symbols for architecture i386: > ... Well that's odd. I have CMake from trunk installed, and I was able to use it to build a very simple project with one C source universal. I was also able to build CMake itself universal. There's no bug in CMake--at least, not anymore. There might, however, be a bug in LLVM's build system that's causing this. > > > 3) Shared library > > The build fails if I try to build llvm using BUILD_SHARED_LIBS=ON ... the issue is that when the build tries to use the tools, dyld can't find libllvm-3.4svn.dylib because it's not yet installed. The CMake build obviously needs to be taught to set the install names to relative paths on Mac OS, like the autotools build does. Chip From micah.villmow at smachines.com Wed Jul 24 14:28:19 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Wed, 24 Jul 2013 21:28:19 +0000 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: References: Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE600708605C@smi-exchange1.smi.local> Chandler brought up removing it back in February but Rogelio Serrano said he could maintain it and Jeff Fifield from Xilinx was supposed to check if someone could help. If no one has stepped up in the past 5 months, then I don't see an issue with removing it. Micah > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Eric Christopher > Sent: Wednesday, July 24, 2013 2:10 PM > To: peckw at wesleypeck.com > Cc: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] Deprecating and removing the MBlaze backend > > Doesn't seem to get a lot of love since most of the commits in the last 3 years > have been maintenance. I guess it doesn't take a whole lot of maintenance > either, but... > > cc'ing Wesley since he seems to be the last guy to commit to it. > > Thoughts? > > -eric > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From fifield at mtnhigh.net Wed Jul 24 14:35:29 2013 From: fifield at mtnhigh.net (Jeff Fifield) Date: Wed, 24 Jul 2013 15:35:29 -0600 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: References: Message-ID: So this came up a few months ago and I briefly tried to drum up support within Xilinx to maintain/improve the MBlaze backend. I also put it on my todo list to starting doing some of this myself if needed. It now sounds like I should increase the priority of these tasks. My request is to wait another couple of months before moving to deprecate and remove. If I can't generate any activity by then, go for it -- I completely understand that a backend without a maintainer no good. Thanks, Jeff On Wed, Jul 24, 2013 at 3:10 PM, Eric Christopher wrote: > Doesn't seem to get a lot of love since most of the commits in the > last 3 years have been maintenance. I guess it doesn't take a whole > lot of maintenance either, but... > > cc'ing Wesley since he seems to be the last guy to commit to it. > > Thoughts? > > -eric > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Wed Jul 24 15:01:47 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Thu, 25 Jul 2013 00:01:47 +0200 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: Hi 1- for(rit=Result.begin();rit!=Result.end();++rit) { Value* Address= BlockAddress::get (*rit); IndirectBrInst *IBI = IndirectBrInst::Create(Address, Result.size(),i->getTerminator() ); IBI->addDestination((*rit)); } I tried this code , but the needed destination wasn't added. 2- About LLVM backend $ llc -march=cpp example_file.ll -o I think it will be so helpful for my work Thanks On 24 July 2013 08:37, Tim Northover wrote: > Hi Rasha, > > On Wed, Jul 24, 2013 at 12:28 AM, Rasha Omar > wrote: > > 1- I need the first example. > > Oh good. > > 2- I set the Address uninitialized according to the documentation > > " Setting the name on the Value automatically updates the module's symbol > > table" from Value.h source code > > That's referring to a string name, and is only really important for > clarity and debugging the IR being produced. It means you could start > out with something like > > [...] > %1 = add i32 %lhs, %rhs > ret i32 %1 > [...] > > (where the %1 is just an automatically incrementing label provided by > LLVM) then call MyAddInst.setName("theSum") and LLVM would > automatically convert this to: > > %theSum = add i32 %lhs, %rhs > ret i32 %theSum > > You still have to have an initialised, valid Value pointer to be able > to do this. > > > 3- I'm not sure about "select" instruction, you mean that the address is > the > > new destination (basic block)that will be added > > The address will be, directly or indirectly, the result of a > "BlockAddress::get(...)" call. Perhaps directly (though that would be > rather useless since then you'd just as well create a direct branch to > that block, perhaps via a "select" as in my code, or via load/store. > Perhaps even passed into the function as a parameter in a rather > bizarre set of circumstances. > > Do you know about the Cpp backend, by the way? It can be very useful > for working out just what you have to write to emit certain LLVM IR. > > What you do is write your own .ll file by hand, having the features > you want, then run > > $ llc -march=cpp example_file.ll -o - > > LLVM will produce some C++ code that generates the module you wrote. > It's not necessarily in the best of styles, but shows you roughly > which calls you should be making. > > Cheers. > > Tim. > -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob.wilson at apple.com Wed Jul 24 15:14:11 2013 From: bob.wilson at apple.com (Bob Wilson) Date: Wed, 24 Jul 2013 15:14:11 -0700 Subject: [LLVMdev] -Os In-Reply-To: References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> Message-ID: On Jul 23, 2013, at 3:59 PM, Jim Grosbach wrote: > > On Jul 23, 2013, at 3:40 PM, Renato Golin wrote: > >> On 23 July 2013 19:36, Jim Grosbach wrote: >> This isn’t just a nitpick. This is exactly why you’re seeing differences. The pass managers aren’t always set up the same, for example. >> >> FWIW, I feel your pain. This is a long-standing weakness of our infrastructure. >> >> Jim, >> >> A while ago I proposed that we annotated the options the front-end passed to the back-end on the IR with named metadata, but it didn't catch on. >> >> Would it make sense to have some call-back mechanism while setting back-end flags to keep a tab on what's called and have a dump as metadata, so that you can just write it to the IR file at the end? More or less what we have for functions, currently. >> >> This would hint llc, lli and others to what flags it must set itself (architecture, optimizations, etc) and would minimize the impact of split compilation. Those tools are free to ignore any option it doesn't recognize, of course, as with any metadata. >> >> Another way would be to teach llc, lli and others all command line options of all supported front-ends, but that wouldn't be very productive, I think. > > Maybe? I’m not sure what a good answer is here. Bob and Bill have both looked at this in more detail than I. What do you guys think? The plan is that all (or at least almost all) of those options will move to be function attributes. The backend will be changed to use those attributes instead of command line options. That should solve this problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Wed Jul 24 15:45:05 2013 From: rkotler at mips.com (reed kotler) Date: Wed, 24 Jul 2013 15:45:05 -0700 Subject: [LLVMdev] static functions and optimization Message-ID: <51F058F1.2010100@mips.com> I have some stub functions that are essentially static, but they cannot be removed. What linkage type should I use in that case. Internal linkage appears to get the functions discarded if they are not referenced under certain optimizations. This is part of the gcc mips16 hack for floating point. For example, a function like floor from the math library will be called with an external reference to function floor. At that time, the compiler does not know whether floor was compiled as mips16 or mips32. It generates the call to floor as if it is a mips16 compiled function. It also generates a stub that the linker can use if during link time it is discovered that "floor" is a mips32 function. The stubs, which are mips32 code, will move the first argument register into a floating point register, call floor, and upon return will move the integer return value into a floating point register. If I designate this function as having internal linkage, then in some cases, the optimizer will throw it away. In that case, at link time it will call floor, and if floor is compiled as mips32, this will break a mips16 compiled program. The linker does not know there is an issue because only functions with certain kinds of signatures need a helper function for the call. From rkotler at mips.com Wed Jul 24 16:07:40 2013 From: rkotler at mips.com (Reed Kotler) Date: Wed, 24 Jul 2013 16:07:40 -0700 Subject: [LLVMdev] static functions and optimization In-Reply-To: <51F058F1.2010100@mips.com> References: <51F058F1.2010100@mips.com> Message-ID: <51F05E3C.4080807@mips.com> Maybe there is some attribute I can add this will not allow the function to be discarded. On 07/24/2013 03:45 PM, reed kotler wrote: > I have some stub functions that are essentially static, but they cannot > be removed. > > What linkage type should I use in that case. Internal linkage appears to > get the functions discarded if they are not referenced under certain > optimizations. > > This is part of the gcc mips16 hack for floating point. > > For example, a function like floor from the math library will be called > with an external reference to function floor. > > At that time, the compiler does not know whether floor was compiled as > mips16 or mips32. > > It generates the call to floor as if it is a mips16 compiled function. > > It also generates a stub that the linker can use if during link time it > is discovered that "floor" is a mips32 function. > > The stubs, which are mips32 code, will move the first argument register > into a floating point register, call floor, and upon return will move > the integer return value into a floating point register. > > If I designate this function as having internal linkage, then in some > cases, the optimizer will throw it away. > > In that case, at link time it will call floor, and if floor is compiled > as mips32, this will break a mips16 compiled program. > > The linker does not know there is an issue because only functions with > certain kinds of signatures need a helper function for the call. From rogelio.serrano at gmail.com Wed Jul 24 17:35:31 2013 From: rogelio.serrano at gmail.com (Rogelio Serrano) Date: Thu, 25 Jul 2013 08:35:31 +0800 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: <3947CD34E13C4F4AB2D94AD35AE3FE600708605C@smi-exchange1.smi.local> References: <3947CD34E13C4F4AB2D94AD35AE3FE600708605C@smi-exchange1.smi.local> Message-ID: On Jul 25, 2013 5:29 AM, "Micah Villmow" wrote: > > Chandler brought up removing it back in February but Rogelio Serrano said he could maintain it and Jeff Fifield from Xilinx was supposed to check if someone could help. Most of my customers changed their minds and dropped mblaze. One stuck to gcc and the rest moved to another arch and toolchain. I'm Sorry guys. > > If no one has stepped up in the past 5 months, then I don't see an issue with removing it. > > Micah > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > On Behalf Of Eric Christopher > > Sent: Wednesday, July 24, 2013 2:10 PM > > To: peckw at wesleypeck.com > > Cc: llvmdev at cs.uiuc.edu > > Subject: [LLVMdev] Deprecating and removing the MBlaze backend > > > > Doesn't seem to get a lot of love since most of the commits in the last 3 years > > have been maintenance. I guess it doesn't take a whole lot of maintenance > > either, but... > > > > cc'ing Wesley since he seems to be the last guy to commit to it. > > > > Thoughts? > > > > -eric > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Wed Jul 24 19:19:22 2013 From: atrick at apple.com (Andrew Trick) Date: Wed, 24 Jul 2013 19:19:22 -0700 Subject: [LLVMdev] Pre-RA scheduler details In-Reply-To: References: Message-ID: <4764D669-EC00-468F-860C-D944149C6AA1@apple.com> On Jul 24, 2013, at 12:17 PM, Chenyang Liu wrote: > Hi, > > I'm interested in the two pre-RA instruction schedulers used in LLVM, list-hybrid and list-ilp. I've done some digging on the internet and played around with executing some test files using the two schedules. However, I'm still uncertain of the behaviors and heuristics used in each. > > For example, the XXXX_ls_rr_sort::isReady for hybrid includes a 3 cycle readydelay (seems arbitrary) whereas the ilp version readies all instructions once data dependencies are resolved. > > Additionally, the ilp_ls_rr_sort::operator() just calls BURRSort at the end after applying a bunch of heuristics to the queue beforehand. Reading it is quite laborious (very few comments) and was wondering if anyone had any references to what exactly it is doing and why it is doing it in such order. > > I'm focused right now on understanding the behavior of the two schedulers. If anyone could give me some direction (papers, slides) that would be greatly appreciated. To be blunt, those heuristics are ad-hoc at best and can't be justified in any way other than running benchmarks. They were added to avoid pathologically bad scheduling in a few cases, then tuned based on this formula: list-ilp = do the least damage overall running the test-suite on x86_64 list-hybrid = do the least damage on armv7 Those will both be retired by the MachineScheduler (MI Scheduler). How to turn it on: http://article.gmane.org/gmane.comp.compilers.llvm.devel/63242 Other recent messages: http://article.gmane.org/gmane.comp.compilers.llvm.devel/63747 http://article.gmane.org/gmane.comp.compilers.llvm.devel/64145 Dev mtg BOF: http://llvm.org/devmtg/2012-11/Larin-Trick-Scheduling.pdf Information is scattered now between the comments, commit log, and mailing list. A design document is a good idea. I can commit to publishing one as soon as the scheduler is enabled for mainstream architectures. The design is still evolving. The priority is supporting all targets with a great deal of flexibility and avoiding poor scheduling in strange cases. It's not really about optimal scheduling for some popular benchmarks. Optimal scheduling is very processor specific. The idea is that backend implementers can invest as little or as much effort into pluging in their own scheduling heuristics as makes sense for their project. A rich set of tools is available for them. Thanks for your interest, -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From loupicciano at comcast.net Wed Jul 24 19:38:34 2013 From: loupicciano at comcast.net (Lou Picciano) Date: Thu, 25 Jul 2013 02:38:34 +0000 (UTC) Subject: [LLVMdev] First Pass at building dragon egg-3.3 for clang 3.3 - using gcc-4.7 Message-ID: <599444808.1137181.1374719914773.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> LLVM Friends, First time attempting a build of dragonegg, using our shiny new install of clang-3.3. I'm clearly off to a terrifying start! dragonegg-3.3.src$ CXX=/usr/bin/gcc GCC=/usr/bin/gcc ENABLE_LLVM_PLUGINS=1 LLVM_CONFIG=/usr/bin/llvm-config CFLAGS=-I/usr/clang/3.3/lib/clang/3.3/include CXXFLAGS="-I/usr/clang/3.3/lib/clang/3.3/include" make Compiling utils/TargetInfo.cpp In file included from /usr/clang/3.3/include/llvm/Support/DataTypes.h:67:0, from /usr/clang/3.3/include/llvm/Support/type_traits.h:20, from /usr/clang/3.3/include/llvm/ADT/StringRef.h:13, from /usr/clang/3.3/include/llvm/ADT/Twine.h:13, from /usr/clang/3.3/include/llvm/ADT/Triple.h:13, from /home/drlou/Downloads/dragonegg-3.3.src/utils/TargetInfo.cpp:23: /usr/clang/3.3/lib/clang/3.3/include/stdint.h:32:54: error: missing binary operator before token "(" /usr/clang/3.3/lib/clang/3.3/include/stdint.h:187:0: warning: "__int_least32_t" redefined [enabled by default] /usr/clang/3.3/lib/clang/3.3/include/stdint.h:113:0: note: this is the location of the previous definition /usr/clang/3.3/lib/clang/3.3/include/stdint.h:188:0: warning: "__uint_least32_t" redefined [enabled by default] /usr/clang/3.3/lib/clang/3.3/include/stdint.h:114:0: note: this is the location of the previous definition /usr/clang/3.3/lib/clang/3.3/include/stdint.h:189:0: warning: "__int_least16_t" redefined [enabled by default] /usr/clang/3.3/lib/clang/3.3/include/stdint.h:115:0: note: this is the location of the previous definition -------------- next part -------------- An HTML attachment was scrubbed... URL: From shining.llvm at gmail.com Wed Jul 24 19:56:46 2013 From: shining.llvm at gmail.com (ning Shi) Date: Thu, 25 Jul 2013 10:56:46 +0800 Subject: [LLVMdev] Insert new basic blocks In-Reply-To: References: Message-ID: You can get the context of the module and the function which you want to insert the basicblock. Then you can use BasicBlock::Create(context, "NameofBasicBlock", F); to create your basicblock. 2013/7/24 Rasha Omar > Using Module Pass > How could I insert new basic blocks such as > while(i==1) {} > in the IR and also change the predecessors and successors according to > inserting these basic blocks. > > -- > *Rasha Salah Omar > Msc Student at E-JUST > Demonestrator at Faculty of Computers and Informatics > Benha University > * > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clattner at apple.com Wed Jul 24 21:20:50 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 24 Jul 2013 21:20:50 -0700 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: References: Message-ID: <6E6EDD16-5F53-46F0-A8C9-9EF01247EBA0@apple.com> On Jul 24, 2013, at 2:10 PM, Eric Christopher wrote: > Doesn't seem to get a lot of love since most of the commits in the > last 3 years have been maintenance. I guess it doesn't take a whole > lot of maintenance either, but... > > cc'ing Wesley since he seems to be the last guy to commit to it. I say that we drop it. If someone steps up to start maintaining it, they can begin by resurrecting it from SVN. -Chris From clattner at apple.com Wed Jul 24 21:24:13 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 24 Jul 2013 21:24:13 -0700 Subject: [LLVMdev] [cfe-dev] Host compiler requirements: Dropping VS 2008, using C++11? In-Reply-To: References: Message-ID: On Jul 23, 2013, at 9:12 PM, Chandler Carruth wrote: > On Tue, Jul 23, 2013 at 7:17 PM, Wang Qi wrote: > -1. > > I believe there are still a lot of people using VC 2008, though I can't give the data. > VC 2008 (even the express version) is enough for a lot of development, such like game development. > Unless someone within the community steps forward with a powerful argument to continue to support VC 2008, I'm going to make the call: we don't support it. +1. I completely agree. -Chris > > Why? I don't actually disagree with your points, but I think there are overriding concerns: > > 1) The pragmatic fact is that we simply don't have enough contributors and active (within the community) users that use, exercise, report bugs, and provide patches to support VC2008 to credibly say we support it. The fact is that we already don't, and we won't going forward regardless of what folks say on this email thread. > > 2) #1 isn't a problem that it is worth it to the community to solve. That is, I would rather have the developers and members of the community working to better support more modern Windows platforms rather than this old one. So I think it is actively in our best interest to not invest in changing #1. > > 3) LLVM (and its subprojects) have a long history of beneficially tracking and leveraging modern aspects of C++. We want to do more of this faster, not less of it slower, because it significantly improves the cleanliness, maintainability, simplicity, and performance of our libraries. To this end, it is directly to the benefit of the project to stay as close as possible to the latest versions of the various toolchain vendors. > > 4) Users of LLVM that are necessarily dealing with an unchanging toolchain and environment always have the option of freezing their version of LLVM along with that environment, or working assiduously to build a sufficiently strong role within the community to both provide the necessary testing and fixes for the environment (#1 above) and overcome the burden it places on the rest of the project (#3). > > At this point, I suspect we should put the subject to rest. > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From clattner at apple.com Wed Jul 24 21:37:17 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 24 Jul 2013 21:37:17 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: > > On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher wrote: > >> This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it? > >> > > > > Pretty close yeah. > > > > Another thought and alternate strategy for dealing with these sorts of things: > > A much more broad set of callback machinery that allows the backend to > communicate values or other information back to the front end that can > then decide what to do. We can define an interface around this, but > instead of having the backend vending diagnostics we have the callback > take a "do something with this value" which can just be "communicate > it back to the front end" or a diagnostic callback can be passed down > from the front end, etc. > > This will probably take a bit more design to get a general framework > set up, but imagine the usefulness of say being able to automatically > reschedule a jitted function to a thread with a larger default stack > size if the callback states that the thread size was N+1 where N is > the size of the stack for a thread you've created. > > FWIW, *this* is what I was trying to get across. Not that it wouldn't be a callback-based mechanism, but that it should be a fully general mechanism rather than having something to do with warnings, errors, notes, etc. If a frontend chooses to use it to produce such diagnostics, cool, but there are other use cases that the same machinery should serve. How about this: keep the jist of the current API, but drop the "warning"- or "error"-ness of the API. Instead, the backend just includes an enum value (plus string message for extra data). The frontend makes the decision of how to render the diagnostic (or not, dropping them is fine) along with how to map them onto warning/error or whatever concepts they use. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Wed Jul 24 22:16:11 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 24 Jul 2013 22:16:11 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: On Wed, Jul 24, 2013 at 9:37 PM, Chris Lattner wrote: > > On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: > > > On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher > wrote: >> >> >> This is pretty much the same as what Quentin proposed (with the >> >> addition of the enum), isn't it? >> >> >> > >> > Pretty close yeah. >> > >> >> Another thought and alternate strategy for dealing with these sorts of >> things: >> >> A much more broad set of callback machinery that allows the backend to >> communicate values or other information back to the front end that can >> then decide what to do. We can define an interface around this, but >> instead of having the backend vending diagnostics we have the callback >> take a "do something with this value" which can just be "communicate >> it back to the front end" or a diagnostic callback can be passed down >> from the front end, etc. >> >> This will probably take a bit more design to get a general framework >> set up, but imagine the usefulness of say being able to automatically >> reschedule a jitted function to a thread with a larger default stack >> size if the callback states that the thread size was N+1 where N is >> the size of the stack for a thread you've created. > > > FWIW, *this* is what I was trying to get across. Not that it wouldn't be a > callback-based mechanism, but that it should be a fully general mechanism > rather than having something to do with warnings, errors, notes, etc. If a > frontend chooses to use it to produce such diagnostics, cool, but there are > other use cases that the same machinery should serve. > > > How about this: keep the jist of the current API, but drop the "warning"- or > "error"-ness of the API. Instead, the backend just includes an enum value > (plus string message for extra data). The frontend makes the decision of > how to render the diagnostic (or not, dropping them is fine) along with how > to map them onto warning/error or whatever concepts they use. For the cases where backend diagnostics make sense (such as asm parsing or the like) it'll make sense for the backend to include a warning/error classification (& I don't think its absence is the key to what Chandler's getting at - the point is to avoid phrasing things as "diagnostics" from LLVM whenever it's possible to expose some broader information & let frontends do whatever they want with it) - that'll save every frontend having to make a, probably very similar, classification - epsecially in the case of errors it seems unlikely a frontend could do anything else but classify it as an error. The diagnostic blob should also include some 'id' though, yes, so frontends can, in a pinch, classify things differently (we have an LLVM flag "-fatal-assembly-warnings" or the like - we could kill that & just have the client callback on diagnostics elevate the warning to an error (indeed Clang already has infrastructure for promoting warnings to errors, so that should be used - no need for other implementations of the same idea)). I'm not quite clear from your suggestion whether you're suggesting the backend would produce a complete diagnostic string, or just the parameters - requiring/leaving it up to the frontend to have a full textual string for each backend diagnostic with the right number of placeholders, etc. I'm sort of in two minds about that - I like the idea that frontends keep all the user-rendered text (means localization issues are in one place, the frontend - rather than ending up with english text backend diagnostics rendered in a non-english client (yeah, pretty hypothetical, I'm not sure anyone has localized uses of LLVM)). But this does mean that there's no "free ride" - frontends must have some explicit handling of each backend diagnostic (some crappy worst-case fallback, but it won't be a useful message). & I don't think this avoids the desire to have non-diagnostic callbacks whenever possible (notify of interesting things, frontends can decide whether to use that information to emit a diagnostic based on some criteria or behave differently in another way). From dblaikie at gmail.com Wed Jul 24 22:20:22 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 24 Jul 2013 22:20:22 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <7C305296-ADC0-4279-8DC2-A0339F14ADB8@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <7C305296-ADC0-4279-8DC2-A0339F14ADB8@apple.com> Message-ID: On Tue, Jul 23, 2013 at 4:55 PM, Quentin Colombet wrote: > Hi David, > > Thanks for the feedbacks/clarifications, I appreciate. > You will find my comments inlined with your mail. > > For a quick reference, here is the summary. > > ** Summary ** > > The other considered approach would provide some callbacks for > event/information we think they are important. > Like Chris pointed out, some of the information is not available for > querying, thus the call back should provide sufficient information. Sure - the idea is we'd just flesh this out based on user-demand. Anything we need, we plumb through. A Simple Matter of Programming. (yeah, some might say a maintenance burden, but I'm still assuming we don't have, nor want, many of these) > The client of the callback could then query llvm if the callbacks does not > provide sufficient information (and assuming more information would be > available via querying). Sure > > At the moment, I see the following use cases: > - Optimization hints (see Hal’s idea). > - Fatal error/warning reporting (this could be done via the first proposal: > enum + bool + message.). See my reply to Chris regarding the tradeoffs between enum + classification (there's more than just error/warning - it should be an enum in any case, even if we only support those two for now) + message and enum + (maybe a classification) + placeholder parameters (which could be strings, or possibly more than strings - such as DI* related stuff which might allow us to link back to original source variables in a frontend). > - Stack size reporting. The point with stack size is to indicate that this is one (particularly common/canonical) example of when a warning that might obviously be implemented by the LLVM warning callback could be better served by a more general-purpose callback. That's my understanding anyway - again, this is hardly my domain, so take what I have with a teacup of salt - in part I'm just trying to help the dialog along so we're all on the same page, not express the relative value of these options. > > What else? > Thoughts? > > Thanks again for all the feedbacks. > > Cheers, > > -Quentin > > > On Jul 23, 2013, at 3:37 PM, David Blaikie wrote: > > On Mon, Jul 22, 2013 at 4:17 PM, Quentin Colombet > wrote: > > Hi, > > Compared to my previous email, I have added Hal’s idea for formatting the > message and pull back some idea from the "querying framework”. > Indeed, I propose to add some information in the reporting so that a > front-end (more generally a client) can filter the diagnostics or take > proper actions. > See the details hereafter. > > On Jul 22, 2013, at 2:25 PM, Chandler Carruth wrote: > > > On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher > wrote: > > > This is pretty much the same as what Quentin proposed (with the > addition of the enum), isn't it? > > > Pretty close yeah. > > > Another thought and alternate strategy for dealing with these sorts of > things: > > A much more broad set of callback machinery that allows the backend to > communicate values or other information back to the front end that can > then decide what to do. We can define an interface around this, but > instead of having the backend vending diagnostics we have the callback > take a "do something with this value" which can just be "communicate > it back to the front end" or a diagnostic callback can be passed down > from the front end, etc. > > This will probably take a bit more design to get a general framework > set up, but imagine the usefulness of say being able to automatically > reschedule a jitted function to a thread with a larger default stack > size if the callback states that the thread size was N+1 where N is > the size of the stack for a thread you've created. > > > > FWIW, *this* is what I was trying to get across. Not that it wouldn't be a > callback-based mechanism, but that it should be a fully general mechanism > rather than having something to do with warnings, errors, notes, etc. If a > frontend chooses to use it to produce such diagnostics, cool, but there are > other use cases that the same machinery should serve. > > > I like the general idea. > > To be sure I understood the proposal, let me give an example. > > ** Example ** > The compiler says here is the size of the stack for Loc via a “handler” > (“handler" in the sense whatever mechanism we come up to make such > communication possible). Then the front-end builds the diagnostic from that > information (or query for more if needed) or drops everything if it does not > care about this size for instance (either it does not care at all or the > size is small enough compared to its setting). > > ** Comments ** > Unless we have one handler per -kind of - use, and I would like to avoid > that, > > > I think that's, somewhat, Chandlers point (sorry, I hate to play > Telephone here - but hope to help clarify some positions... apologies > if this just further obfuscates/confuses). I believe the idea is some > kind of generic callback type with a bunch of no-op-default callbacks, > override the ones your frontend cares about ("void onStackSize(size_t > bytes)", etc...). > > I see. > > > Yes, we could come up with a system that doesn't require adding a new > function call for every data that needs to be provided. Does it seem > likely we'll have so many of these that we'll really want/need that? > > > That’s a good point. > I guess there are use cases that we may not anticipate and it would be nice > if we do not have to modify this interface for that. > In particular, out-of-the-tree target may want to do fancy thing without to > push for change in the public tree. > > For now, I believe this is great that many people contributed their ideas > and use cases, we can then decide what we want and what we do not want to > address in the near future. > > Anyway, I agree that we should focus on what we care the most when we are > done with collecting the use cases (are we done?). > > > I think we should still provide an information on the severity of the > thing we are reporting and what we are reporting. > Basically: > - Severity: Will the back-end abort after the information pass down or will > it continue (the boolean of the previous proposal)? > > > In the case of specific callbacks - that would be statically known > (you might have a callback for some particular problem (we ran out of > registers & can't satisfy this inline asm due to the register > allocation of this function) - it's the very contract that that > callback is a fatal problem). If we have a more generic callback > mechanism, yes - we could enshrine some general properties (such as > fatality) in the common part & leave the specifics of what kind of > fatal problem to the 'blob’. > > > Agreed. > > > - Kind: What are we reporting (the enum from the previous proposal)? > > I also think we should be able to provide a default (formatted) message, > such that a client that does not need to know what to do with the > information can still print something somehow useful, especially on abort > cases. > > > Do you have some examples of fatal/abort cases we already have & how > they're reported today? (including what kind of textual description we > use?) > > > Here are a few example of some warnings that you can find in the back end: > - IntrinsicLowering: > case Intrinsic::stackrestore: { > if (!Warned) > > errs() << “WARNING: this target does not support the llvm.stack" > > << (Callee->getIntrinsicID() == Intrinsic::stacksave ? > "save" : "restore") << " intrinsic.\n”; > - PrologEpilogInserter: > errs() << "warning: Stack size limit exceeded (" << MFI->getStackSize() > << ") in " << Fn.getName() << ".\n”; > > Actually, when I wrote this, I had Hal’s Optimization Diary in mind: > > - "*This loop* cannot be optimized because the induction variable *i* is > unsigned, and cannot be proved not to wrap" > - "*This loop* cannot be vectorized because the compiler cannot prove that > memory read from *a* does not overlap with memory written to through *b*" > - "*This loop* cannot be vectorized because the compiler cannot prove that > it is unconditionally safe to dereference the pointer *a*. > > - The message string is text but a single kind of markup is allowed: > , for example: > "We cannot vectorize because is an unfriendly > variable" > (where the first will be replaced by text derived from a DIScope and the > second from a DIVariable). > > > > Thus, it sounds a good thing to me to have a string with some markers to > format the output plus the arguments to be used in the formatted output. > Hal’s proposal could do the trick (although I do not know if DIDescriptor > are the best thing to use here). > > ** Summary ** > I am starting to think that we should be able to cover the reporting case > plus some querying mechanism with something like: > void reportSomehtingToFEHandler(enum Reporting Kind, bool IsAbort, information>, const char* DefautMsg, the defautMsg>) > > > Personally I dislike losing type safety in this kind of API ("here's a > blob of data you must programmatically query based on a schema implied > by the 'Kind' parameter & some documentation you read"). I'd prefer > explicit callbacks per thing - if we're going to have to write an > explicit structure & document the parameters to each of these > callbacks anyway, it seems easier to document that by API. (for fatal > cases we could have no default implementations - this would ensure > clients would be required to update for new callbacks & not > accidentally suppress them) > > > If we want to let people do everything they want without modifying the > existing structure, I think we need both. > I agree that for the cases we care the most, we could specialize that > approach with a specific call back for each cases. > > Also note that the initial intend was to report error/warning, thus the > is not required. > My point is, if we do not rely on the front-end to take specific action, > there is no need to pass this information. Therefore, we can eliminate that > type safety problem. > > > Where is supposed to be the class/struct/pointer to the > relevant information for this kind. If it is not enough the FE should call > additional APIs to get what it wants. > > This looks similar to the “classical” back-end report to front-end approach, > but gives more freedom to the front-end as it can choose what to do based on > the attached information. > I also believe this will reduce the need to expose back-end APIs and speed > up the process. > > > Speed up the process of adding these diagnostic, possibly at the cost > of having a more opaque/inscrutible API to data from LLVM, it seems. > > > Good point and it matches what I have written above. > > Indeed, if we rely on the front-end to query for more information when it > gets a reporting like this, we do not need to pass the information and we > avoid this problem. > > > However, the ability of the front-end (or client) to query the back-end is > limited to the places where the back-end is reporting something. Also, if > the back-end is meant to abort, the FE cannot do anything about it (e.g., > the stack size is not big enough for the jitted function). > That’s why I said it cover “some" querying mechanism. > > ** Concerns ** > 1. Testing. > > > Testing - I assume we'd have opt/llc register for all these callbacks > & print them in some way (it doesn't need to have a "stack size is too > small warning" it just needs to print the stack size whenever it's > told - or maybe have some way to opt in to callback rendering) & then > check the behavior with FileCheck as usual (perhaps print this stuff > to stderr so it doesn't get confused with bytecode/asm under -o -). > > That tests LLVM's contract - that it called the notifications. > Testing Clang's behavior when these notifications are provided would > either require end-to-end testing (just having Clang tests that run > LLVM, assume it already passes the LLVM-only tests & then tests Clang > behavior on top of that) as we do in a few places already - or have > some kind of stub callback implementation we can point Clang to (it > could read a file of callbacks to call). That would be nice, but going > on past experience I don't suppose anyone would actually bother to > implement it. > > > Assuming we will always emit these reports, relying on a front-end to filter > out what is not currently relevant (e.g., we did not set the stack size > warning in the FE), what will happen when we test (make check) without a > front-end? > I am afraid we will pollute all tests or we will have some difficulty to > test a specific reporting. > > 2. Regarding a completely query based approach, like Chris pointed out, I do > not see how we can report consistent information at any given time. Also, > Eric, coming back to your jit example, how could we prevent the back-end to > abort if the jitted is too big for the stack? > > > Eric's (originally Chandler's discussed in person) example wasn't > about aborting compilation. The idea was you JIT a function, you get a > callback saying "stack size 1 million bytes" and so you spin up a > thread that has a big stack to run this function you just compiled. > > > Thanks for the clarification. > > > The point of the example is that a pure warning-based callback is > 'general' for LLVM but very specific for LLVM /clients/ (LLVM as a > library, something we should keep in mind) - all they can do is print > it out. If we provide a more general feature for LLVM clients > (callbacks that notify those clients about things they might be > interested in, like the size of a function's stack) then they can > build other features (apart from just warning) such as a JIT that > dynamically chooses thread stack size based on the stack size of the > functions it JITs. > > > I see your point. > > > > 3. Back to the strictly reporting approach where we extend the inlineasm > handler (the approach proposed by Bob and that I sketched a little bit > more), now looks similar to this approach expect that the back-end chooses > what it is relevant to report and the back-end does not need to pass down > the information. > The concern is how do we easily (in a robust and extendable manner) provide > a front-end/back-end option for each warning/error? > > > Thoughts? > > Cheers, > > -Quentin > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From clattner at apple.com Wed Jul 24 22:23:27 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 24 Jul 2013 22:23:27 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> On Jul 24, 2013, at 10:16 PM, David Blaikie wrote: >> How about this: keep the jist of the current API, but drop the "warning"- or >> "error"-ness of the API. Instead, the backend just includes an enum value >> (plus string message for extra data). The frontend makes the decision of >> how to render the diagnostic (or not, dropping them is fine) along with how >> to map them onto warning/error or whatever concepts they use. > > I'm not quite clear from your suggestion whether you're suggesting the > backend would produce a complete diagnostic string, or just the > parameters - requiring/leaving it up to the frontend to have a full > textual string for each backend diagnostic with the right number of > placeholders, etc. I'm sort of in two minds about that - I like the > idea that frontends keep all the user-rendered text (means > localization issues are in one place, the frontend - rather than > ending up with english text backend diagnostics rendered in a > non-english client (yeah, pretty hypothetical, I'm not sure anyone has > localized uses of LLVM)). But this does mean that there's no "free > ride" - frontends must have some explicit handling of each backend > diagnostic (some crappy worst-case fallback, but it won't be a useful > message). I don't have a specific proposal in mind, other than thinking along the exact same lines as you above. :) The best approach is probably hybrid: the diagnostic producer can produce *both* a full string like today, as well as an "ID + enum" pair. This way, clang can use the later, but llc (as one example of something we want to keep simple) could print the former, and frontends that get unknown enums could fall back on the full string. > & I don't think this avoids the desire to have non-diagnostic > callbacks whenever possible (notify of interesting things, frontends > can decide whether to use that information to emit a diagnostic based > on some criteria or behave differently in another way). Sure, but we also don't want to block progress in some area because we have a desire to solve a bigger problem. -Chris From dblaikie at gmail.com Wed Jul 24 22:30:34 2013 From: dblaikie at gmail.com (David Blaikie) Date: Wed, 24 Jul 2013 22:30:34 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> Message-ID: On Wed, Jul 24, 2013 at 10:23 PM, Chris Lattner wrote: > On Jul 24, 2013, at 10:16 PM, David Blaikie wrote: >>> How about this: keep the jist of the current API, but drop the "warning"- or >>> "error"-ness of the API. Instead, the backend just includes an enum value >>> (plus string message for extra data). The frontend makes the decision of >>> how to render the diagnostic (or not, dropping them is fine) along with how >>> to map them onto warning/error or whatever concepts they use. >> >> I'm not quite clear from your suggestion whether you're suggesting the >> backend would produce a complete diagnostic string, or just the >> parameters - requiring/leaving it up to the frontend to have a full >> textual string for each backend diagnostic with the right number of >> placeholders, etc. I'm sort of in two minds about that - I like the >> idea that frontends keep all the user-rendered text (means >> localization issues are in one place, the frontend - rather than >> ending up with english text backend diagnostics rendered in a >> non-english client (yeah, pretty hypothetical, I'm not sure anyone has >> localized uses of LLVM)). But this does mean that there's no "free >> ride" - frontends must have some explicit handling of each backend >> diagnostic (some crappy worst-case fallback, but it won't be a useful >> message). > > I don't have a specific proposal in mind, other than thinking along the exact same lines as you above. :) > > The best approach is probably hybrid: the diagnostic producer can produce *both* a full string like today, as well as an "ID + enum" pair. This way, clang can use the later, but llc (as one example of something we want to keep simple) could print the former, and frontends that get unknown enums could fall back on the full string. Fair-ish. If it were just for the sake of llc it'd be hard to justify having the strings in LLVM rather than just in llc itself, but providing them as fallbacks is probably reasonable/convenient & not likely to be a technical burden. Hopefully if we wanted that we'd still put something in Clang to maintain the frontend diagnostic line rather than letting it slip. > >> & I don't think this avoids the desire to have non-diagnostic >> callbacks whenever possible (notify of interesting things, frontends >> can decide whether to use that information to emit a diagnostic based >> on some criteria or behave differently in another way). > > Sure, but we also don't want to block progress in some area because we have a desire to solve a bigger problem. Sure enough - I think the only reason to pre-empt the bigger problem is to ensure that the immediate progress doesn't lead to bad implementations of those bigger issues being committed due to convenience. Ensuring that the solution we implement now makes it hard to justify (not hard to /do/ badly, just hard to justify doing it badly by ensuring that the right solution is convenient/easy) taking shortcuts later would be good. That might just be the difference between having a function pointer callback for the diagnostic case and instead having a callback type with the diagnostic callback as the first one, with the intent to add more in cases where backend-diagnostics aren't the right tool. That way we have an callback interface we can easily extend. (ideally I'd love to have two things in the callback interface early, as an example - but that's not necessary & probably won't happen) I don't know enough about the particular things Quentin's planning to implement to know whether any of them fall into the "probably shouldn't be an LLVM diagnostic" bag. From westdac at gmail.com Wed Jul 24 23:16:00 2013 From: westdac at gmail.com (Dan) Date: Thu, 25 Jul 2013 00:16:00 -0600 Subject: [LLVMdev] Clang/LLVM 3.3 unwanted attributes being added: NoFramePointerElim Message-ID: Since updating to LLVM 3.3, the system is generating attributes such as: attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } I've tried to add options. I've tracked the code to: NoFramePointerElim I've seen the description of: ./lib/Target/TargetMachine.cpp RESET_OPTION(NoFramePointerElim, "no-frame-pointer-elim"); RESET_OPTION(NoFramePointerElimNonLeaf, "no-frame-pointer-elim-non-leaf"); RESET_OPTION(LessPreciseFPMADOption, "less-precise-fpmad"); RESET_OPTION(UnsafeFPMath, "unsafe-fp-math"); RESET_OPTION(NoInfsFPMath, "no-infs-fp-math"); RESET_OPTION(NoNaNsFPMath, "no-nans-fp-math"); RESET_OPTION(UseSoftFloat, "use-soft-float"); RESET_OPTION(DisableTailCalls, "disable-tail-calls"); I cannot find the code or mechanism to turn off: NoFramePointerElim No code generator has these specialized, so this is happening for all targets. Any help? From g.franceschetti at vidya.it Wed Jul 24 23:51:06 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti *) Date: Thu, 25 Jul 2013 08:51:06 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <87mwpdhrjz.fsf@wanadoo.es> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> <51EE131F.70008@vidya.it> <87mwpdhrjz.fsf@wanadoo.es> Message-ID: <51F0CADA.9090602@vidya.it> Thanks for your reply. Compiler-rt was a problem. From the documentation I thought it was mandatory (so what is it used for?), but it was giving all those error about stdbool.h missing that I reported. Now things got better, but I have still problems. I receive an error that seems related to the fact that the grep command is missing. Is it possible? If grep is needed, how can I found it in Windows? Thanks in advance, Giorgio Il 23/07/2013 23.53, Óscar Fuentes ha scritto: > Giorgio Franceschetti writes: > >> I also tried to build LLVM with 3.3 sources. >> Same problems. > If you omit compiler-rt, does it work? (compiler-rt is not a required > component.) > >> Even worse, Visual Studio hangs and I had to kill the process. >> >> What could it be? Is Visual Studio 2012 working with LLVM/clang? >> >> Or LLVM/Clang is not supposed to work on windows (I saw also that >> there are no binaries ready for the windows platform). > For the most part, LLVM works fine on Windows, with some limitations. > Clang has serious shortcomings, more so if you build it with VS (Mingw > is better because Clang can use the headers and standard C++ library > that comes with MinGW, but not the standard C++ library that comes with > VS.) > >>> It lists a lot of file not found during the execution, but at the >>> end it does create th visual studio projects. >>> Based on the web guide, it should be successful. >>> First question, is it really? > Yes. What you are seeing are the platform checks, where the build system > looks for the presence of functions, headers, etc and then generates a > configuration file with that information. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From Greg_Bedwell at sn.scee.net Thu Jul 25 00:03:28 2013 From: Greg_Bedwell at sn.scee.net (Greg_Bedwell at sn.scee.net) Date: Thu, 25 Jul 2013 08:03:28 +0100 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: <51F0CADA.9090602@vidya.it> References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> <51EE131F.70008@vidya.it> <87mwpdhrjz.fsf@wanadoo.es> <51F0CADA.9090602@vidya.it> Message-ID: Hi Giorgio, > > I receive an error that seems related to the fact that the grep command > is missing. > > Is it possible? If grep is needed, how can I found it in Windows? > See here: http://clang.llvm.org/hacking.html#testingWindows grep (and a few other required tools) are in the GnuWin32 tools. Thanks, Greg Bedwell SN Systems - Sony Computer Entertainment Group ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses. Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793 ********************************************************************** P Please consider the environment before printing this e-mail From t.p.northover at gmail.com Thu Jul 25 00:08:20 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 25 Jul 2013 08:08:20 +0100 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: Hi Rasha, > for(rit=Result.begin();rit!=Result.end();++rit) > { > Value* Address= BlockAddress::get (*rit); > > IndirectBrInst *IBI = IndirectBrInst::Create(Address, Result.size(),i->getTerminator() ); > IBI->addDestination((*rit)); > } This would be creating a block looking something like: [ Do stuff ] indirectbr i8* blockaddress(@foo, %result1), [label %result1] indirectbr i8* blockaddress(@foo, %result2), [label %result2] [...] indirectbr i8* blockaddress(@foo, %resultN), [label %resultN] which isn't valid LLVM IR. Each basic block has to have a single branch at the end so you need to create the IndirectBrInst outside the loop. How do you decide which of the results you want to jump to at run-time? I think that's the key question you have to answer. You want to emit LLVM IR to make this decision and produce a single Value representing it, then use the value just once to create a single indirectbr. Your code might look like: Value *Address = emitCodeToChooseResult(Results, Loc); IndirectBrInst *IBI = IndirectBrInst::Create(Address, Result.size(), i); for (rit = Result.begin(); rit != Result.end(); ++rit) IBI->addDestination(*rit); Cheers. Tim. From michael.m.kuperstein at intel.com Thu Jul 25 00:17:26 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Thu, 25 Jul 2013 07:17:26 +0000 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> Message-ID: <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> A patch is attached. Not sure I'm happy with this due to the aforementioned orthogonality concerns, but I really don't have any better ideas. If anyone does, feel free to offer them, I don't mind throwing this patch into the trash. (Also, not happy with the name, using "speculatable" as Nick suggested, for the lack of other options. If the name stays I'll add it to the documentation.) Regarding auditing the intrinsics - I'd prefer to do this in stages. Here I'm just preserving the current behavior by marking intrinsics that used to be explicitly handled in isSafeToSpeculativelyExecute(), so there should be no functional change. From: Nick Lewycky [mailto:nlewycky at google.com] Sent: Tuesday, July 23, 2013 02:29 To: Kuperstein, Michael M Cc: Nick Lewycky; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Does nounwind have semantics? On 22 July 2013 01:11, Kuperstein, Michael M > wrote: Of course frontends are free to put attributes, but it would be nice if optimizations actually used them. ;-) My use case is that of proprietary frontend that happens to know some library function calls - which are only resolved at link time - have no side effects and are safe to execute speculatively, and wants to tell the optimizer it can move them around however it likes. I'll gladly submit a patch that uses these hints, but I'd like to reach some consensus on what the desired attributes actually are first. The last thing I want is to add attributes that are only useful to myself. Regarding having several orthogonal attributes vs. things like "safetospeculate": To know a function is safe to speculatively execute, I need at least: 1) readnone (readonly is insufficient, unless I know all accessed pointers are valid) 2) nounwind 3) nolongjmp (I guess?) 4) no undefined behavior. This includes things like "halting" and "no division by zero", but that's not, by far, an exhaustive list. I guess there are several ways to handle (4). Ideally, I agree with you, we'd like a set of orthogonal attributes that, taken together, imply that the function's behavior is not undefined. But that requires mapping all sources of undefined behavior (I don't think this is currently documented for LLVM IR, at least not in a centralized fashion) and adding a very specific attribute for each of them. I'm not sure having function declarations with "readnone, nounwind, nolongjmp, halting, nodivbyzero, nopoisonval, nocomparelabels, nounreachable, ..." is desirable. We could also have a "welldefined" attribute and a "halting" attribute where "welldefined" subsumes "halting", if the specific case of a function which halts but may have undefined behavior is important. While the two are not orthogonal, it's similar to the situation with "readnone" and "readonly". Does that sound reasonable? You're entirely right. I forgot about undefined behaviour. If you want a 'speculatable' attribute, I would review that patch. Please audit the intrinsics (at least the target-independent ones) and appropriate library functions for whether you can apply this attribute to them. I think the only optimization that it can trigger is that "isSafeToSpeculativelyExecute" returns true on it. Anything else? Is it safe to infer readnone and nounwind from speculatable? I should mention that speculatable functions are extraordinarily limited in what they can do in the general (non-LLVM-as-a-library) case. They may be hoisted above calls to fork or pthread_create, they may be moved into global constructors (and thus can't depend on global state), etc. However, since you have a specific library you want to generate code against, you have the power to make use of it. I don't expect clang or dragonegg to be able to make use of it. Nick -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Monday, July 22, 2013 10:24 To: Kuperstein, Michael M Cc: Andrew Trick; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Does nounwind have semantics? Kuperstein, Michael M wrote: > I'm not sure I understand why it's blocked on that, by the way. It blocks our ability to automatically deduce the halting attribute in the optimizer, which was necessary for the use case I had at the time. If you have a use case of your own, feel free to propose the patch! (Technically it's not *blocked* -- see how my patch does it! -- but the workarounds are too horrible to be committed.) > Even if we can't apply the attribute ourselves, I don't see why we wouldn't expose that ability to frontends. Frontends are free to put attributes on functions if they want to. Go for it! > I'm not entirely sure "halting" is the right attribute either, by the way. > What I, personally, would like to see is a way to specify a function call is safe to speculatively execute. That implies readnone (not just readonly), nounwind, halting - and Eris knows what else. Nick, is that too strong for you? I strongly prefer the approach of having orthogonal attributes. There are optimizations that you can do with each of these attributes on their own. In particular I think that readonly+halting+nounwind+nolongjmp is going to be common and I'd feel silly if we had a special case for readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more common case. That said, I'm also going to feel silly if we don't end up with enough attributes to allow isSafeToSpeculate to deduce it, which is where we are right now. I was planning to get back to fixing this after Chandler's promised PassManager work. Nick > > Michael > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nick Lewycky > Sent: Monday, July 22, 2013 07:08 > To: Andrew Trick > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Andrew Trick wrote: >> Does 'nounwind' have semantics that inform optimization passes? It seems to in some cases, but not consistently. For example... >> >> int32_t foo(int32_t* ptr) { >> int i = 0; >> int result; >> do { >> bar(ptr); >> result = *ptr; >> bar(ptr); >> } while (i++< *ptr); >> return result; >> } >> >> Say we have a front end that declares bar as... >> >> declare void @bar(i32*) readonly; >> >> So 'bar' is 'readonly' and 'may-unwind'. >> >> When LICM tries to hoist the load it interprets the 'may-unwind' as "MayThrow" in LICM-language and bails. However, when it tries to sink the call itself it sees the 'readonly', assumes no side effects and sinks it below the loads. Hmm... >> >> There doesn't appear to be a way to declare a function that is guaranteed not to write to memory in a way that affects the caller, but may have another well-defined side effect like aborting the program. This is interesting, because that is the way runtime checks for safe languages would like to be defined. I'm perfectly happy telling front ends to generate control flow for well-defined traps, since I like lots of basic blocks in my IR. But I'm still curious how others deal with this. > > Yes, we went through a phase where people would try to use "nounwind+readonly == no side-effects" to optimize. All such optimizations are wrong. Unless otherwise proven, a function may inf-loop, terminate the program, or longjmp. > > I tried to add 'halting' to help solve part of this a long time ago, but it never went in. The problem is that determining whether you have loops requires a FunctionPass (LoopInfo to find loops and SCEV to determine an upper bound) and applying function attributes is an SCC operation (indeed, an SCC is itself a loop), so it's all blocked behind fixing the PassManager to allow CGSGGPasses to depend on FunctionPasses. > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > I'm now in a similar situation where I want 'nounwind' to mean "only exits by terminating the program or a return instruction" but unfortunately functions which longjmp are considered nounwind. I would like to change llvm to make longjmp'ing a form of unwinding (an exceptional exit to the function), but if I were to apply that rule today then we'd start putting dwarf eh tables on all our C code, oops. > > Nick > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: speculatable.diff Type: application/octet-stream Size: 19923 bytes Desc: speculatable.diff URL: From t.p.northover at gmail.com Thu Jul 25 00:28:48 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Thu, 25 Jul 2013 08:28:48 +0100 Subject: [LLVMdev] Steps to addDestination In-Reply-To: References: Message-ID: > Your code might look like: > Value *Address = emitCodeToChooseResult(Results, InsertLoc); And a particularly silly implementation of emitCodeToChooseResults might decide to call rand() at runtime and jump to whichever block it decided. In that case you'd be trying to write code to produce IR resembling: @.blocks = private unnamed_addr constant [2 x i8*] [i8* blockaddress(@foo, %block1), i8* blockaddress(@foo, %block2)] declare i32 @rand() ; This whole block would actually be inside define i32 @foo() { %RandomNumber = call i32 @rand() %ResultIndex = urem i32 %RandomNumber, 2 %ResultBlock = getelementptr [2 x i8*]* @.blocks, i32 0, i32 %ResultIndex %block = load i8** %ResultBlock ; Code above produced by emitCodeToChooseResults, code below produced elsewhere indirectbr i8* %block, [label %block1, label %block2] block1: ret i32 0 block2: ret i32 42 } For this, emitCodeToChooseResult would go through the stages: 1. ConstantArray::get to create the array of each blockaddress in Results. 2. Create a new GlobalVariable to store these addresses 3. Get a reference to rand() using Module->getOrInsertFunction 4. Emit calls to this RandFunction and a urem to calculate the block index. 5. Create a getelementptr instruction referring to the GlobalVariable and the random result 6. Create a load from the address calculated. 7. Done, return this loaded value (as in "return LoadedPtr" rather than "ReturnInst::Create(LoadedPtr, ...)"). There are obviously many other ways you could make the decision. Cheers. Tim. From doob at me.com Thu Jul 25 00:49:18 2013 From: doob at me.com (Jacob Carlborg) Date: Thu, 25 Jul 2013 09:49:18 +0200 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: References: <51EEF351.9030109@mips.com> Message-ID: On 2013-07-24 09:47, Tyler Hardin wrote: > Not much slower. VBox does an amazing job at getting near native > performance on modern machines (those with nested paging etc.). This is > definitely the best option if your computer has ~2g ram and 2+ cores. > Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. At work, it takes significantly longer to boot our Ruby on Rails application on a virtual machine than natively. I also noticed that disk access can be quite slow on a virtual machine compared to native. -- /Jacob Carlborg From renato.golin at linaro.org Thu Jul 25 01:50:13 2013 From: renato.golin at linaro.org (Renato Golin) Date: Thu, 25 Jul 2013 09:50:13 +0100 Subject: [LLVMdev] -Os In-Reply-To: References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> Message-ID: On 24 July 2013 23:14, Bob Wilson wrote: > The plan is that all (or at least almost all) of those options will move > to be function attributes. The backend will be changed to use those > attributes instead of command line options. That should solve this problem. > Yes, I remember the discussion now, makes sense. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholas at mxc.ca Thu Jul 25 02:10:35 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Thu, 25 Jul 2013 02:10:35 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> Message-ID: <51F0EB8B.10406@mxc.ca> Kuperstein, Michael M wrote: > A patch is attached. + const CallInst* CI = dyn_cast(Inst); + return CI->isSafeToSpeculativelyExecute(); "return cast(Inst)->isSafeToSpeculativelyExecute();"? Use cast<> instead of dyn_cast<>. See http://llvm.org/docs/ProgrammersManual.html#isa . Then I don't think it needs to be two lines. You can even remove the extra curly braces around this case. > Not sure I’m happy with this due to the aforementioned orthogonality > concerns, but I really don’t have any better ideas. If anyone does, feel > free to offer them, I don’t mind throwing this patch into the trash. > > (Also, not happy with the name, using “speculatable” as Nick suggested, > for the lack of other options. If the name stays I’ll add it to the > documentation.) That reminds me, the patch needs to come with an update to LangRef.rst. > Regarding auditing the intrinsics – I’d prefer to do this in stages. Sounds fine. I'd like Eric to take a quick look and agree that marking debug intrinsics speculatable is sane. (Yes they already are, but that doesn't mean it's sane. I also want Eric to know that 'speculatable' is going to start showing up in his debug info.) One other thing, your patch and Tobias Grosser's patch (subject "Make .bc en/decoding of AttrKind stable") are in conflict. Whoever lands second will need to merge. Nick > Here I’m just preserving the current behavior by marking intrinsics that > used to be explicitly handled in isSafeToSpeculativelyExecute(), so > there should be no functional change. > > *From:*Nick Lewycky [mailto:nlewycky at google.com] > *Sent:* Tuesday, July 23, 2013 02:29 > *To:* Kuperstein, Michael M > *Cc:* Nick Lewycky; llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Does nounwind have semantics? > > On 22 July 2013 01:11, Kuperstein, Michael M > > > wrote: > > Of course frontends are free to put attributes, but it would be nice > if optimizations actually used them. ;-) > My use case is that of proprietary frontend that happens to know > some library function calls - which are only resolved at link time - > have no side effects and are safe to execute speculatively, and > wants to tell the optimizer it can move them around however it > likes. I'll gladly submit a patch that uses these hints, but I'd > like to reach some consensus on what the desired attributes actually > are first. The last thing I want is to add attributes that are only > useful to myself. > > Regarding having several orthogonal attributes vs. things like > "safetospeculate": > > To know a function is safe to speculatively execute, I need at least: > 1) readnone (readonly is insufficient, unless I know all accessed > pointers are valid) > 2) nounwind > 3) nolongjmp (I guess?) > 4) no undefined behavior. This includes things like "halting" and > "no division by zero", but that's not, by far, an exhaustive list. > > I guess there are several ways to handle (4). > Ideally, I agree with you, we'd like a set of orthogonal attributes > that, taken together, imply that the function's behavior is not > undefined. > But that requires mapping all sources of undefined behavior (I don't > think this is currently documented for LLVM IR, at least not in a > centralized fashion) and adding a very specific attribute for each > of them. I'm not sure having function declarations with "readnone, > nounwind, nolongjmp, halting, nodivbyzero, nopoisonval, > nocomparelabels, nounreachable, ..." is desirable. > > We could also have a "welldefined" attribute and a "halting" > attribute where "welldefined" subsumes "halting", if the specific > case of a function which halts but may have undefined behavior is > important. > While the two are not orthogonal, it's similar to the situation with > "readnone" and "readonly". Does that sound reasonable? > > You're entirely right. I forgot about undefined behaviour. > > If you want a 'speculatable' attribute, I would review that patch. > Please audit the intrinsics (at least the target-independent ones) and > appropriate library functions for whether you can apply this attribute > to them. I think the only optimization that it can trigger is that > "isSafeToSpeculativelyExecute" returns true on it. Anything else? Is it > safe to infer readnone and nounwind from speculatable? > > I should mention that speculatable functions are extraordinarily limited > in what they can do in the general (non–LLVM-as-a-library) case. They > may be hoisted above calls to fork or pthread_create, they may be moved > into global constructors (and thus can't depend on global state), etc. > However, since you have a specific library you want to generate code > against, you have the power to make use of it. I don't expect clang or > dragonegg to be able to make use of it. > > Nick > > -----Original Message----- > From: Nick Lewycky [mailto:nicholas at mxc.ca ] > Sent: Monday, July 22, 2013 10:24 > To: Kuperstein, Michael M > Cc: Andrew Trick; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Kuperstein, Michael M wrote: > > I'm not sure I understand why it's blocked on that, by the way. > > It blocks our ability to automatically deduce the halting attribute > in the optimizer, which was necessary for the use case I had at the > time. > If you have a use case of your own, feel free to propose the patch! > > (Technically it's not *blocked* -- see how my patch does it! -- but > the workarounds are too horrible to be committed.) > > > Even if we can't apply the attribute ourselves, I don't see why > we wouldn't expose that ability to frontends. > > Frontends are free to put attributes on functions if they want to. > Go for it! > > > I'm not entirely sure "halting" is the right attribute either, by > the way. > > What I, personally, would like to see is a way to specify a > function call is safe to speculatively execute. That implies > readnone (not just readonly), nounwind, halting - and Eris knows > what else. Nick, is that too strong for you? > > I strongly prefer the approach of having orthogonal attributes. > There are optimizations that you can do with each of these > attributes on their own. In particular I think that > readonly+halting+nounwind+nolongjmp is going to be common and I'd > feel silly if we had a special case for > readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more > common case. > > That said, I'm also going to feel silly if we don't end up with enough > attributes to allow isSafeToSpeculate to deduce it, which is where we > are right now. I was planning to get back to fixing this after > Chandler's promised PassManager work. > > Nick > > > > > Michael > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu > > [mailto:llvmdev-bounces at cs.uiuc.edu > ] On Behalf Of Nick Lewycky > > Sent: Monday, July 22, 2013 07:08 > > To: Andrew Trick > > Cc: llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] Does nounwind have semantics? > > > > Andrew Trick wrote: > >> Does 'nounwind' have semantics that inform optimization passes? > It seems to in some cases, but not consistently. For example... > >> > >> int32_t foo(int32_t* ptr) { > >> int i = 0; > >> int result; > >> do { > >> bar(ptr); > >> result = *ptr; > >> bar(ptr); > >> } while (i++< *ptr); > >> return result; > >> } > >> > >> Say we have a front end that declares bar as... > >> > >> declare void @bar(i32*) readonly; > >> > >> So 'bar' is 'readonly' and 'may-unwind'. > >> > >> When LICM tries to hoist the load it interprets the 'may-unwind' > as "MayThrow" in LICM-language and bails. However, when it tries to > sink the call itself it sees the 'readonly', assumes no side effects > and sinks it below the loads. Hmm... > >> > >> There doesn't appear to be a way to declare a function that is > guaranteed not to write to memory in a way that affects the caller, > but may have another well-defined side effect like aborting the > program. This is interesting, because that is the way runtime checks > for safe languages would like to be defined. I'm perfectly happy > telling front ends to generate control flow for well-defined traps, > since I like lots of basic blocks in my IR. But I'm still curious > how others deal with this. > > > > Yes, we went through a phase where people would try to use > "nounwind+readonly == no side-effects" to optimize. All such > optimizations are wrong. Unless otherwise proven, a function may > inf-loop, terminate the program, or longjmp. > > > > I tried to add 'halting' to help solve part of this a long time > ago, but it never went in. The problem is that determining whether > you have loops requires a FunctionPass (LoopInfo to find loops and > SCEV to determine an upper bound) and applying function attributes > is an SCC operation (indeed, an SCC is itself a loop), so it's all > blocked behind fixing the PassManager to allow CGSGGPasses to depend > on FunctionPasses. > > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > > > I'm now in a similar situation where I want 'nounwind' to mean > "only exits by terminating the program or a return instruction" but > unfortunately functions which longjmp are considered nounwind. I > would like to change llvm to make longjmp'ing a form of unwinding > (an exceptional exit to the function), but if I were to apply that > rule today then we'd start putting dwarf eh tables on all our C > code, oops. > > > > Nick > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > From michael.m.kuperstein at intel.com Thu Jul 25 02:14:46 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Thu, 25 Jul 2013 09:14:46 +0000 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51F0EB8B.10406@mxc.ca> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> <51F0EB8B.10406@mxc.ca> Message-ID: <251BD6D4E6A77E4586B482B33960D2283361BCA1@HASMSX106.ger.corp.intel.com> Right, will fix the CallInst, copy/pasted from a different case and didn't notice what I was doing, thanks. Re LangRef.rst, that's what I meant, I'm still hoping for better suggestions regarding the name... As to the conflict - Tobias, feel free to go first, I'll merge. -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, July 25, 2013 12:11 To: Kuperstein, Michael M Cc: Nick Lewycky; llvmdev at cs.uiuc.edu; Tobias Grosser; echristo at gmail.com Subject: Re: [LLVMdev] Does nounwind have semantics? Kuperstein, Michael M wrote: > A patch is attached. + const CallInst* CI = dyn_cast(Inst); + return CI->isSafeToSpeculativelyExecute(); "return cast(Inst)->isSafeToSpeculativelyExecute();"? Use cast<> instead of dyn_cast<>. See http://llvm.org/docs/ProgrammersManual.html#isa . Then I don't think it needs to be two lines. You can even remove the extra curly braces around this case. > Not sure I'm happy with this due to the aforementioned orthogonality > concerns, but I really don't have any better ideas. If anyone does, > feel free to offer them, I don't mind throwing this patch into the trash. > > (Also, not happy with the name, using "speculatable" as Nick > suggested, for the lack of other options. If the name stays I'll add > it to the > documentation.) That reminds me, the patch needs to come with an update to LangRef.rst. > Regarding auditing the intrinsics - I'd prefer to do this in stages. Sounds fine. I'd like Eric to take a quick look and agree that marking debug intrinsics speculatable is sane. (Yes they already are, but that doesn't mean it's sane. I also want Eric to know that 'speculatable' is going to start showing up in his debug info.) One other thing, your patch and Tobias Grosser's patch (subject "Make .bc en/decoding of AttrKind stable") are in conflict. Whoever lands second will need to merge. Nick > Here I'm just preserving the current behavior by marking intrinsics > that used to be explicitly handled in isSafeToSpeculativelyExecute(), > so there should be no functional change. > > *From:*Nick Lewycky [mailto:nlewycky at google.com] > *Sent:* Tuesday, July 23, 2013 02:29 > *To:* Kuperstein, Michael M > *Cc:* Nick Lewycky; llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Does nounwind have semantics? > > On 22 July 2013 01:11, Kuperstein, Michael M > > > wrote: > > Of course frontends are free to put attributes, but it would be nice > if optimizations actually used them. ;-) > My use case is that of proprietary frontend that happens to know > some library function calls - which are only resolved at link time - > have no side effects and are safe to execute speculatively, and > wants to tell the optimizer it can move them around however it > likes. I'll gladly submit a patch that uses these hints, but I'd > like to reach some consensus on what the desired attributes actually > are first. The last thing I want is to add attributes that are only > useful to myself. > > Regarding having several orthogonal attributes vs. things like > "safetospeculate": > > To know a function is safe to speculatively execute, I need at least: > 1) readnone (readonly is insufficient, unless I know all accessed > pointers are valid) > 2) nounwind > 3) nolongjmp (I guess?) > 4) no undefined behavior. This includes things like "halting" and > "no division by zero", but that's not, by far, an exhaustive list. > > I guess there are several ways to handle (4). > Ideally, I agree with you, we'd like a set of orthogonal attributes > that, taken together, imply that the function's behavior is not > undefined. > But that requires mapping all sources of undefined behavior (I don't > think this is currently documented for LLVM IR, at least not in a > centralized fashion) and adding a very specific attribute for each > of them. I'm not sure having function declarations with "readnone, > nounwind, nolongjmp, halting, nodivbyzero, nopoisonval, > nocomparelabels, nounreachable, ..." is desirable. > > We could also have a "welldefined" attribute and a "halting" > attribute where "welldefined" subsumes "halting", if the specific > case of a function which halts but may have undefined behavior is > important. > While the two are not orthogonal, it's similar to the situation with > "readnone" and "readonly". Does that sound reasonable? > > You're entirely right. I forgot about undefined behaviour. > > If you want a 'speculatable' attribute, I would review that patch. > Please audit the intrinsics (at least the target-independent ones) and > appropriate library functions for whether you can apply this attribute > to them. I think the only optimization that it can trigger is that > "isSafeToSpeculativelyExecute" returns true on it. Anything else? Is > it safe to infer readnone and nounwind from speculatable? > > I should mention that speculatable functions are extraordinarily > limited in what they can do in the general (non-LLVM-as-a-library) > case. They may be hoisted above calls to fork or pthread_create, they > may be moved into global constructors (and thus can't depend on global state), etc. > However, since you have a specific library you want to generate code > against, you have the power to make use of it. I don't expect clang or > dragonegg to be able to make use of it. > > Nick > > -----Original Message----- > From: Nick Lewycky [mailto:nicholas at mxc.ca ] > Sent: Monday, July 22, 2013 10:24 > To: Kuperstein, Michael M > Cc: Andrew Trick; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Does nounwind have semantics? > > Kuperstein, Michael M wrote: > > I'm not sure I understand why it's blocked on that, by the way. > > It blocks our ability to automatically deduce the halting attribute > in the optimizer, which was necessary for the use case I had at the > time. > If you have a use case of your own, feel free to propose the patch! > > (Technically it's not *blocked* -- see how my patch does it! -- but > the workarounds are too horrible to be committed.) > > > Even if we can't apply the attribute ourselves, I don't see why > we wouldn't expose that ability to frontends. > > Frontends are free to put attributes on functions if they want to. > Go for it! > > > I'm not entirely sure "halting" is the right attribute either, by > the way. > > What I, personally, would like to see is a way to specify a > function call is safe to speculatively execute. That implies > readnone (not just readonly), nounwind, halting - and Eris knows > what else. Nick, is that too strong for you? > > I strongly prefer the approach of having orthogonal attributes. > There are optimizations that you can do with each of these > attributes on their own. In particular I think that > readonly+halting+nounwind+nolongjmp is going to be common and I'd > feel silly if we had a special case for > readnone+halting+nounwind+nolongjmp and thus couldn't optimize the more > common case. > > That said, I'm also going to feel silly if we don't end up with enough > attributes to allow isSafeToSpeculate to deduce it, which is where we > are right now. I was planning to get back to fixing this after > Chandler's promised PassManager work. > > Nick > > > > > Michael > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu > > [mailto:llvmdev-bounces at cs.uiuc.edu > ] On Behalf Of Nick Lewycky > > Sent: Monday, July 22, 2013 07:08 > > To: Andrew Trick > > Cc: llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] Does nounwind have semantics? > > > > Andrew Trick wrote: > >> Does 'nounwind' have semantics that inform optimization passes? > It seems to in some cases, but not consistently. For example... > >> > >> int32_t foo(int32_t* ptr) { > >> int i = 0; > >> int result; > >> do { > >> bar(ptr); > >> result = *ptr; > >> bar(ptr); > >> } while (i++< *ptr); > >> return result; > >> } > >> > >> Say we have a front end that declares bar as... > >> > >> declare void @bar(i32*) readonly; > >> > >> So 'bar' is 'readonly' and 'may-unwind'. > >> > >> When LICM tries to hoist the load it interprets the 'may-unwind' > as "MayThrow" in LICM-language and bails. However, when it tries to > sink the call itself it sees the 'readonly', assumes no side effects > and sinks it below the loads. Hmm... > >> > >> There doesn't appear to be a way to declare a function that is > guaranteed not to write to memory in a way that affects the caller, > but may have another well-defined side effect like aborting the > program. This is interesting, because that is the way runtime checks > for safe languages would like to be defined. I'm perfectly happy > telling front ends to generate control flow for well-defined traps, > since I like lots of basic blocks in my IR. But I'm still curious > how others deal with this. > > > > Yes, we went through a phase where people would try to use > "nounwind+readonly == no side-effects" to optimize. All such > optimizations are wrong. Unless otherwise proven, a function may > inf-loop, terminate the program, or longjmp. > > > > I tried to add 'halting' to help solve part of this a long time > ago, but it never went in. The problem is that determining whether > you have loops requires a FunctionPass (LoopInfo to find loops and > SCEV to determine an upper bound) and applying function attributes > is an SCC operation (indeed, an SCC is itself a loop), so it's all > blocked behind fixing the PassManager to allow CGSGGPasses to depend > on FunctionPasses. > > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html > > > > I'm now in a similar situation where I want 'nounwind' to mean > "only exits by terminating the program or a return instruction" but > unfortunately functions which longjmp are considered nounwind. I > would like to change llvm to make longjmp'ing a form of unwinding > (an exceptional exit to the function), but if I were to apply that > rule today then we'd start putting dwarf eh tables on all our C > code, oops. > > > > Nick > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From brianherman at gmail.com Thu Jul 25 02:16:38 2013 From: brianherman at gmail.com (Brian Herman) Date: Thu, 25 Jul 2013 04:16:38 -0500 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: References: <51EEF351.9030109@mips.com> Message-ID: What you could do is single boot Ubuntu using a program called refit. On Thu, Jul 25, 2013 at 2:49 AM, Jacob Carlborg wrote: > On 2013-07-24 09:47, Tyler Hardin wrote: > > Not much slower. VBox does an amazing job at getting near native >> performance on modern machines (those with nested paging etc.). This is >> definitely the best option if your computer has ~2g ram and 2+ cores. >> Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. >> > > At work, it takes significantly longer to boot our Ruby on Rails > application on a virtual machine than natively. I also noticed that disk > access can be quite slow on a virtual machine compared to native. > > > -- > /Jacob Carlborg > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Wed Jul 24 22:02:16 2013 From: rkotler at mips.com (reed kotler) Date: Wed, 24 Jul 2013 22:02:16 -0700 Subject: [LLVMdev] disabling fastcc Message-ID: <51F0B158.1060908@mips.com> Is there a way to disable the generation of fastcc from either the clang step or if run as two separate steps, the llc step? Reed From rkotler at mips.com Wed Jul 24 23:12:13 2013 From: rkotler at mips.com (Reed Kotler) Date: Wed, 24 Jul 2013 23:12:13 -0700 Subject: [LLVMdev] static functions and optimization In-Reply-To: <51F05E3C.4080807@mips.com> References: <51F058F1.2010100@mips.com> <51F05E3C.4080807@mips.com> Message-ID: <51F0C1BD.5040104@mips.com> Seems like -femit-all-decls partially works around this. But I would still like to solve the real problem of creating a function which is local/static but which cannot be thrown away by the optimizer if not referenced. On 07/24/2013 04:07 PM, Reed Kotler wrote: > Maybe there is some attribute I can add this will not allow the function > to be discarded. > > On 07/24/2013 03:45 PM, reed kotler wrote: >> I have some stub functions that are essentially static, but they cannot >> be removed. >> >> What linkage type should I use in that case. Internal linkage appears to >> get the functions discarded if they are not referenced under certain >> optimizations. >> >> This is part of the gcc mips16 hack for floating point. >> >> For example, a function like floor from the math library will be called >> with an external reference to function floor. >> >> At that time, the compiler does not know whether floor was compiled as >> mips16 or mips32. >> >> It generates the call to floor as if it is a mips16 compiled function. >> >> It also generates a stub that the linker can use if during link time it >> is discovered that "floor" is a mips32 function. >> >> The stubs, which are mips32 code, will move the first argument register >> into a floating point register, call floor, and upon return will move >> the integer return value into a floating point register. >> >> If I designate this function as having internal linkage, then in some >> cases, the optimizer will throw it away. >> >> In that case, at link time it will call floor, and if floor is compiled >> as mips32, this will break a mips16 compiled program. >> >> The linker does not know there is an issue because only functions with >> certain kinds of signatures need a helper function for the call. From rkotler at mips.com Wed Jul 24 23:35:25 2013 From: rkotler at mips.com (Reed Kotler) Date: Wed, 24 Jul 2013 23:35:25 -0700 Subject: [LLVMdev] Clang/LLVM 3.3 unwanted attributes being added: NoFramePointerElim In-Reply-To: References: Message-ID: <51F0C72D.2050701@mips.com> With the new attribute system, there are some subtle differences in certain cases. Some things are now are determined by clang and passed to llc whereas previously this only could happen if clang/llvm where run as one executable. But now that same information can be added to attributes and passed to llc without you having to add command line parameters for those to llc. Which compiler are you running? You are running clang and llc separately? On 07/24/2013 11:16 PM, Dan wrote: > Since updating to LLVM 3.3, the system is generating attributes such as: > > attributes #0 = { nounwind "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" > "no-infs-fp-math"="false" "no-nans-fp-math"="false" > "unsafe-fp-math"="false" "use-soft-float"="false" } > > > I've tried to add options. > > I've tracked the code to: > > NoFramePointerElim > > I've seen the description of: > > ./lib/Target/TargetMachine.cpp > > RESET_OPTION(NoFramePointerElim, "no-frame-pointer-elim"); > RESET_OPTION(NoFramePointerElimNonLeaf, "no-frame-pointer-elim-non-leaf"); > RESET_OPTION(LessPreciseFPMADOption, "less-precise-fpmad"); > RESET_OPTION(UnsafeFPMath, "unsafe-fp-math"); > RESET_OPTION(NoInfsFPMath, "no-infs-fp-math"); > RESET_OPTION(NoNaNsFPMath, "no-nans-fp-math"); > RESET_OPTION(UseSoftFloat, "use-soft-float"); > RESET_OPTION(DisableTailCalls, "disable-tail-calls"); > > I cannot find the code or mechanism to turn off: NoFramePointerElim > > No code generator has these specialized, so this is happening for all targets. > > Any help? > From rkotler at mips.com Wed Jul 24 23:38:25 2013 From: rkotler at mips.com (Reed Kotler) Date: Wed, 24 Jul 2013 23:38:25 -0700 Subject: [LLVMdev] Clang/LLVM 3.3 unwanted attributes being added: NoFramePointerElim In-Reply-To: References: Message-ID: <51F0C7E1.603@mips.com> Maybe this would be interesting to you: http://llvm.org/viewvc/llvm-project?view=revision&revision=187093 On 07/24/2013 11:16 PM, Dan wrote: > Since updating to LLVM 3.3, the system is generating attributes such as: > > attributes #0 = { nounwind "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" > "no-infs-fp-math"="false" "no-nans-fp-math"="false" > "unsafe-fp-math"="false" "use-soft-float"="false" } > > > I've tried to add options. > > I've tracked the code to: > > NoFramePointerElim > > I've seen the description of: > > ./lib/Target/TargetMachine.cpp > > RESET_OPTION(NoFramePointerElim, "no-frame-pointer-elim"); > RESET_OPTION(NoFramePointerElimNonLeaf, "no-frame-pointer-elim-non-leaf"); > RESET_OPTION(LessPreciseFPMADOption, "less-precise-fpmad"); > RESET_OPTION(UnsafeFPMath, "unsafe-fp-math"); > RESET_OPTION(NoInfsFPMath, "no-infs-fp-math"); > RESET_OPTION(NoNaNsFPMath, "no-nans-fp-math"); > RESET_OPTION(UseSoftFloat, "use-soft-float"); > RESET_OPTION(DisableTailCalls, "disable-tail-calls"); > > I cannot find the code or mechanism to turn off: NoFramePointerElim > > No code generator has these specialized, so this is happening for all targets. > > Any help? > From baldrick at free.fr Thu Jul 25 04:51:24 2013 From: baldrick at free.fr (Duncan Sands) Date: Thu, 25 Jul 2013 13:51:24 +0200 Subject: [LLVMdev] First Pass at building dragon egg-3.3 for clang 3.3 - using gcc-4.7 In-Reply-To: <599444808.1137181.1374719914773.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> References: <599444808.1137181.1374719914773.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> Message-ID: <51F1113C.7020809@free.fr> Hi, On 25/07/13 04:38, Lou Picciano wrote: > LLVM Friends, > > First time attempting a build of dragonegg, using our shiny new install of > clang-3.3. I'm clearly off to a terrifying start! > > > dragonegg-3.3.src$ CXX=/usr/bin/gcc This says to compile using gcc GCC=/usr/bin/gcc This says that the plugin will be used with /usr/bin/gcc when built. ENABLE_LLVM_PLUGINS=1 > LLVM_CONFIG=/usr/bin/llvm-config CFLAGS=-I/usr/clang/3.3/lib/clang/3.3/include > CXXFLAGS="-I/usr/clang/3.3/lib/clang/3.3/include" These say to use clang headers. Not sure why you want to use clang headers when compiling using gcc. make > > Compiling utils/TargetInfo.cpp > In file included from /usr/clang/3.3/include/llvm/Support/DataTypes.h:67:0, > from /usr/clang/3.3/include/llvm/Support/type_traits.h:20, > from /usr/clang/3.3/include/llvm/ADT/StringRef.h:13, > from /usr/clang/3.3/include/llvm/ADT/Twine.h:13, > from /usr/clang/3.3/include/llvm/ADT/Triple.h:13, > from > /home/drlou/Downloads/dragonegg-3.3.src/utils/TargetInfo.cpp:23: > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:32:54: error: missing binary > operator before token "(" This seems to be saying that gcc doesn't like the clang headers. I suggest you don't use them. Anyway, not sure why you are trying to do this so complicated, doesn't this work: make ? Other comments: CXX is for providing a C++ compiler, so should be g++ not gcc Since /usr/bin/gcc is in your path, probably you don't need CXX to force this compiler (or did you put clang in front of it in your path?; if so it still shouldn't be needed since clang can also compile dragonegg) GCC=/usr/bin/gcc is probably not needed, since it is the default Ciao, Duncan. > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:187:0: warning: "__int_least32_t" > redefined [enabled by default] > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:113:0: note: this is the location > of the previous definition > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:188:0: warning: "__uint_least32_t" > redefined [enabled by default] > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:114:0: note: this is the location > of the previous definition > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:189:0: warning: "__int_least16_t" > redefined [enabled by default] > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:115:0: note: this is the location > of the previous definition > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From David.Chisnall at cl.cam.ac.uk Thu Jul 25 04:53:56 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Thu, 25 Jul 2013 12:53:56 +0100 Subject: [LLVMdev] ubuntu on the mac In-Reply-To: References: <51EEF351.9030109@mips.com> Message-ID: <639F5E41-818B-4C89-B1D5-ED9334BC2C24@cl.cam.ac.uk> On 25 Jul 2013, at 08:49, Jacob Carlborg wrote: > On 2013-07-24 09:47, Tyler Hardin wrote: > >> Not much slower. VBox does an amazing job at getting near native >> performance on modern machines (those with nested paging etc.). This is >> definitely the best option if your computer has ~2g ram and 2+ cores. >> Give the Ubuntu VM 2g and 1 (maybe 2) core/s and it should be fine. > > At work, it takes significantly longer to boot our Ruby on Rails application on a virtual machine than natively. I also noticed that disk access can be quite slow on a virtual machine compared to native. It takes a bit less than 10 minutes for me to do a full build of LLVM+Clang in a FreeBSD VM with 2GB of RAM allocated on my MacBook Pro. The I/O speed is slower than native, but it is still faster than building natively on the Core i5-based Ubuntu box that I tried recently (both machines have SSDs - if you're doing a lot of builds, mechanical disks will cripple your productivity). Of course, the 24 core machine we have in a rack with 256GB of RAM is noticeably faster - it does a clean build in 3 minutes - but it's less convenient to carry around (and a lot louder!). David From rasha.sala7 at gmail.com Thu Jul 25 05:12:42 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Thu, 25 Jul 2013 14:12:42 +0200 Subject: [LLVMdev] Error for AllocaInst and Instruction Message-ID: Hi, For the following code const Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", Bb); Bb->getInstList().push_back(newInst); It gives me the error " error: no matching constructor for initialization of 'llvm::AllocaInst' AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", Bb);" By using Instruction const Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); Instruction* newInst = new Instruction(Int32Type, 0, "flag", Bb); Bb->getInstList().push_back(newInst); error: allocating an object of abstract class type 'llvm::Instruction' Instruction* newInst = new Instruction(Int32Type, 0, "flag", Bb); -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Thu Jul 25 05:14:58 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Thu, 25 Jul 2013 14:14:58 +0200 Subject: [LLVMdev] Error for AllocaInst and Instruction In-Reply-To: References: Message-ID: Bb >> BasicBlock* On 25 July 2013 14:12, Rasha Omar wrote: > Hi, > For the following code > const Type * Int32Type = > IntegerType::getInt32Ty(getGlobalContext()); > AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", Bb); > Bb->getInstList().push_back(newInst); > > It gives me the error > " error: no matching constructor for initialization of 'llvm::AllocaInst' > AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", > Bb);" > > > By using Instruction > > const Type * Int32Type = > IntegerType::getInt32Ty(getGlobalContext()); > Instruction* newInst = new Instruction(Int32Type, 0, "flag", Bb); > Bb->getInstList().push_back(newInst); > > > error: allocating an object of abstract class type 'llvm::Instruction' > Instruction* newInst = new Instruction(Int32Type, 0, "flag", > Bb); > > > -- > *Rasha Salah Omar > Msc Student at E-JUST > Demonestrator at Faculty of Computers and Informatics > Benha University > * > -- *Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Thu Jul 25 05:08:01 2013 From: rkotler at mips.com (Reed Kotler) Date: Thu, 25 Jul 2013 05:08:01 -0700 Subject: [LLVMdev] static functions and optimization In-Reply-To: <51F058F1.2010100@mips.com> References: <51F058F1.2010100@mips.com> Message-ID: <51F11521.9080402@mips.com> On 07/24/2013 03:45 PM, reed kotler wrote: > I have some stub functions that are essentially static, but they cannot > be removed. > > What linkage type should I use in that case. Internal linkage appears to > get the functions discarded if they are not referenced under certain > optimizations. > > This is part of the gcc mips16 hack for floating point. > > For example, a function like floor from the math library will be called > with an external reference to function floor. > > At that time, the compiler does not know whether floor was compiled as > mips16 or mips32. > This description below is not completely correct. But the issue remains nevertheless that some of the static function stubs are being optimized below. I have filed a bug against myself to finally document this mips16/32 floating point interoperability scheme. Even after implementing it, I forget parts of how it works if I have not worked on it for a while. > It generates the call to floor as if it is a mips16 compiled function. > > It also generates a stub that the linker can use if during link time it > is discovered that "floor" is a mips32 function. > > The stubs, which are mips32 code, will move the first argument register > into a floating point register, call floor, and upon return will move > the integer return value into a floating point register. > > If I designate this function as having internal linkage, then in some > cases, the optimizer will throw it away. > > In that case, at link time it will call floor, and if floor is compiled > as mips32, this will break a mips16 compiled program. > > The linker does not know there is an issue because only functions with > certain kinds of signatures need a helper function for the call. From rafael.espindola at gmail.com Thu Jul 25 05:16:48 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Thu, 25 Jul 2013 08:16:48 -0400 Subject: [LLVMdev] disabling fastcc In-Reply-To: <51F0B158.1060908@mips.com> References: <51F0B158.1060908@mips.com> Message-ID: -fast-isel=false On 25 July 2013 01:02, reed kotler wrote: > Is there a way to disable the generation of fastcc from either the clang > step or if run as two separate steps, the llc step? > > Reed > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From rafael.espindola at gmail.com Thu Jul 25 05:18:41 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Thu, 25 Jul 2013 08:18:41 -0400 Subject: [LLVMdev] static functions and optimization In-Reply-To: <51F0C1BD.5040104@mips.com> References: <51F058F1.2010100@mips.com> <51F05E3C.4080807@mips.com> <51F0C1BD.5040104@mips.com> Message-ID: llvm.used should do it. On 25 July 2013 02:12, Reed Kotler wrote: > Seems like -femit-all-decls partially works around this. > > But I would still like to solve the real problem of creating a function > which is local/static but which cannot be thrown away by the optimizer if > not referenced. > > > > On 07/24/2013 04:07 PM, Reed Kotler wrote: >> >> Maybe there is some attribute I can add this will not allow the function >> to be discarded. >> >> On 07/24/2013 03:45 PM, reed kotler wrote: >>> >>> I have some stub functions that are essentially static, but they cannot >>> be removed. >>> >>> What linkage type should I use in that case. Internal linkage appears to >>> get the functions discarded if they are not referenced under certain >>> optimizations. >>> >>> This is part of the gcc mips16 hack for floating point. >>> >>> For example, a function like floor from the math library will be called >>> with an external reference to function floor. >>> >>> At that time, the compiler does not know whether floor was compiled as >>> mips16 or mips32. >>> >>> It generates the call to floor as if it is a mips16 compiled function. >>> >>> It also generates a stub that the linker can use if during link time it >>> is discovered that "floor" is a mips32 function. >>> >>> The stubs, which are mips32 code, will move the first argument register >>> into a floating point register, call floor, and upon return will move >>> the integer return value into a floating point register. >>> >>> If I designate this function as having internal linkage, then in some >>> cases, the optimizer will throw it away. >>> >>> In that case, at link time it will call floor, and if floor is compiled >>> as mips32, this will break a mips16 compiled program. >>> >>> The linker does not know there is an issue because only functions with >>> certain kinds of signatures need a helper function for the call. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From loupicciano at comcast.net Thu Jul 25 06:47:18 2013 From: loupicciano at comcast.net (Lou Picciano) Date: Thu, 25 Jul 2013 13:47:18 +0000 (UTC) Subject: [LLVMdev] First Pass at building dragon egg-3.3 for clang 3.3 - using gcc-4.7 Message-ID: <173045026.1146727.1374760038471.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> Duncan, Many thanks for your comments. The core issue we're running into is this: $ GCC=/usr/bin/gcc LLVM_CONFIG=/usr/bin/llvm-config make Compiling utils/TargetInfo.cpp Linking TargetInfo ld: fatal: library -lLLVMSupport: not found ld: fatal: file processing errors. No output written to TargetInfo collect2: error: ld returned 1 exit statusAll other gyrations are attempts to shoehorn LLVMSupport into the compile. I've been sourcing the Makefile and README for hints. To your point on declaring the location of gcc: Oddly, though gcc (4.7) is available to the environment, the Makefile does not find it without explicit direction: dragonegg-3.3.src$ make make: cc: Command not foundThanks in advance, Lou Picciano (and apologies for the top posting; crappy email client) On 25/07/13 04:38, Lou Picciano wrote: > LLVM Friends, > > First time attempting a build of dragonegg, using our shiny new install of > clang-3.3. I'm clearly off to a terrifying start! > > > dragonegg-3.3.src$ CXX=/usr/bin/gcc This says to compile using gcc GCC=/usr/bin/gcc This says that the plugin will be used with /usr/bin/gcc when built. ENABLE_LLVM_PLUGINS=1 > LLVM_CONFIG=/usr/bin/llvm-config CFLAGS=-I/usr/clang/3.3/lib/clang/3.3/include > CXXFLAGS="-I/usr/clang/3.3/lib/clang/3.3/include" These say to use clang headers. Not sure why you want to use clang headers when compiling using gcc. make > > Compiling utils/TargetInfo.cpp > In file included from /usr/clang/3.3/include/llvm/Support/DataTypes.h:67:0, > from /usr/clang/3.3/include/llvm/Support/type_traits.h:20, > from /usr/clang/3.3/include/llvm/ADT/StringRef.h:13, > from /usr/clang/3.3/include/llvm/ADT/Twine.h:13, > from /usr/clang/3.3/include/llvm/ADT/Triple.h:13, > from > /home/drlou/Downloads/dragonegg-3.3.src/utils/TargetInfo.cpp:23: > /usr/clang/3.3/lib/clang/3.3/include/stdint.h:32:54: error: missing binary > operator before token "(" This seems to be saying that gcc doesn't like the clang headers. I suggest you don't use them. Anyway, not sure why you are trying to do this so complicated, doesn't this work: make ? Other comments: CXX is for providing a C++ compiler, so should be g++ not gcc Since /usr/bin/gcc is in your path, probably you don't need CXX to force this compiler (or did you put clang in front of it in your path?; if so it still shouldn't be needed since clang can also compile dragonegg) GCC=/usr/bin/gcc is probably not needed, since it is the default Ciao, Duncan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Thu Jul 25 07:00:10 2013 From: baldrick at free.fr (Duncan Sands) Date: Thu, 25 Jul 2013 16:00:10 +0200 Subject: [LLVMdev] First Pass at building dragon egg-3.3 for clang 3.3 - using gcc-4.7 In-Reply-To: <173045026.1146727.1374760038471.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> References: <173045026.1146727.1374760038471.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> Message-ID: <51F12F6A.8050908@free.fr> Hi, On 25/07/13 15:47, Lou Picciano wrote: > Duncan, > Many thanks for your comments. > > The core issue we're running into is this: > > $ GCC=/usr/bin/gcc LLVM_CONFIG=/usr/bin/llvm-config make > Compiling utils/TargetInfo.cpp > Linking TargetInfo > ld: fatal: library -lLLVMSupport: not found llvm-config is supposed to say where the libraries are. Take a look at the output of usr/bin/llvm-config --ldflags On my system it outputs -L/usr/local/lib -lrt -ldl -lpthread -lz and indeed libLLVMSupport is there $ ls /usr/local/lib/libLLVMSupport* /usr/local/lib/libLLVMSupport.a > ld: fatal: file processing errors. No output written to TargetInfo > collect2: error: ld returned 1 exit status > > All other gyrations are attempts to shoehorn LLVMSupport into the compile. I've > been sourcing the Makefile and README for hints. > > To your point on declaring the location of gcc: Oddly, though gcc (4.7) is > available to the environment, the Makefile does not find it without explicit > direction: > > dragonegg-3.3.src$ make > make: cc: Command not found What O/S is this on? Most linux O/S's auto-define the CC variable to point to the system C compiler. For example on my system $ echo $CC gcc It looks like on your system CC is defined to be equal to cc but there is no such compiler. Sounds like a misconfigured system. Best wishes, Duncan. > > Thanks in advance, Lou Picciano > > (and apologies for the top posting; crappy email client) > > On 25/07/13 04:38, Lou Picciano wrote: >> LLVM Friends, >> >> First time attempting a build of dragonegg, using our shiny new install of >> clang-3.3. I'm clearly off to a terrifying start! >> >> >> dragonegg-3.3.src$ CXX=/usr/bin/gcc > This says to compile using gcc > GCC=/usr/bin/gcc > This says that the plugin will be used with /usr/bin/gcc when built. > ENABLE_LLVM_PLUGINS=1 >> LLVM_CONFIG=/usr/bin/llvm-config > > CFLAGS=-I/usr/clang/3.3/lib/clang/3.3/include >> CXXFLAGS="-I/usr/clang/3.3/lib/clang/3.3/include" > These say to use clang headers. Not sure why you want to use clang headers when > compiling using gcc. > make >> >> Compiling utils/TargetInfo.cpp >> In file included from /usr/clang/3.3/include/llvm/Support/DataTypes.h:67:0, >> from /usr/clang/3.3/include/llvm/Support/type_traits.h:20, >> from /usr/clang/3.3/include/llvm/ADT/StringRef.h:13, >> from /usr/clang/3.3/include/llvm/ADT/Twine.h:13, >> from /usr/clang/3.3/include/llvm/ADT/Triple.h:13, >> from >> /home/drlou/Downloads/dragonegg-3.3.src/utils/TargetInfo.cpp:23: >> /usr/clang/3.3/lib/clang/3.3/include/stdint.h:32:54: error: missing binary >> operator before token "(" > This seems to be saying that gcc doesn't like the clang headers. I suggest you > don't use them. > Anyway, not sure why you are trying to do this so complicated, doesn't this > work: > make > ? > Other comments: > CXX is for providing a C++ compiler, so should be g++ not gcc > Since /usr/bin/gcc is in your path, probably you don't need CXX to force this > compiler (or did you put clang in front of it in your path?; if so it still > shouldn't be needed since clang can also compile dragonegg) > GCC=/usr/bin/gcc is probably not needed, since it is the default > Ciao, Duncan. From rafael.espindola at gmail.com Thu Jul 25 07:09:42 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Thu, 25 Jul 2013 10:09:42 -0400 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: <6E6EDD16-5F53-46F0-A8C9-9EF01247EBA0@apple.com> References: <6E6EDD16-5F53-46F0-A8C9-9EF01247EBA0@apple.com> Message-ID: > I say that we drop it. If someone steps up to start maintaining it, they can begin by resurrecting it from SVN. Patches attached. Cheers, Rafael -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm.patch.bz2 Type: application/x-bzip2 Size: 76683 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: clang.patch.bz2 Type: application/x-bzip2 Size: 3650 bytes Desc: not available URL: From omnia at mailinator.com Thu Jul 25 09:07:17 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Thu, 25 Jul 2013 09:07:17 -0700 (PDT) Subject: [LLVMdev] Passing String to an external function in llvm Message-ID: <1374768437655-59798.post@n5.nabble.com> Hi All, On my llvm pass I have some variable named "expr" which is being declared as :- string expr; // or char *expr; // Now I want to pass this "expr" to some external function. How can I do this?? Similarly, How can I pass variable "var" to an external function which is being decalred as :- Vector var; Any help will be appreciated.............. -- View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From jeremyhu at apple.com Thu Jul 25 09:21:34 2013 From: jeremyhu at apple.com (Jeremy Huddleston Sequoia) Date: Thu, 25 Jul 2013 09:21:34 -0700 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: <6E4662D7-3572-4D74-9EF7-68C9319B430B@apple.com> On Jul 24, 2013, at 14:18, Charles Davis wrote: > > On Jul 24, 2013, at 11:11 AM, Jeremy Huddleston Sequoia wrote: > >> I recently took a stab at changing the MacPorts llvm-3.4 port from the configure-based build system to the cmake-based build system. >> >> There are a couple of issues that I still haven't been able to work out yet and would like to know if these are just configuration issues on my side or bugs I should file at bugs.llvm.org: >> >> > tl;dr: LLVM CMake support is primarily designed for Windows/Visual Studio (especially in LLVM proper) and Linux (especially in compiler-rt), and needs lots of work to work well on Mac OS X or anywhere else. In particular, it is missing many features that are present in the autotools build. (Though, as the CMake proponents 'round here are quick to point out, the autotools system is itself missing some features that are present in CMake.) The main reason for this was because it looked like autoconf was the second rate citizen: 1) only cmake supported building building the ubsan runtime for a while: http://llvm.org/bugs/show_bug.cgi?id=14341 2) some guides only talk about building with cmake (eg http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html) 3) etc... If 2-systems is the long term goal, then I'll stick with what works (but still file bugs if cmake folks want to track them). If autoconf is going away at some point, I'd prefer to make the transition while we still have the fallback rather than have a period without it being available. > ... > The CMake support in compiler-rt evolved in a completely different direction from the Makefiles; it was primarily designed originally, IIRC, to support building asan (and later, the other sanitizers), and mostly on Linux at that. Other platforms and configurations were an afterthought. It needs serious work--in particular, the runtime libraries are built with the host compiler (not usually a problem, unless you're building a cross compiler), and (as you've probably noticed by now) it doesn't make fat archives. Patches welcome if you can speak CMake ;). Ok. Unfortunately, I speak m4 way better than cmake. I've used it a few times, but I don't have much experience with it. >> ... > Well that's odd. I have CMake from trunk installed, and I was able to use it to build a very simple project with one C source universal. I was also able to build CMake itself universal. There's no bug in CMake--at least, not anymore. There might, however, be a bug in LLVM's build system that's causing this. Ok, I'll try to figure this one out when I get some cycles. I can probably trace where that goes wrong. FWIW, I have cmake 2.8.10.2. >> 3) Shared library >> >> The build fails if I try to build llvm using BUILD_SHARED_LIBS=ON ... the issue is that when the build tries to use the tools, dyld can't find libllvm-3.4svn.dylib because it's not yet installed. > The CMake build obviously needs to be taught to set the install names to relative paths on Mac OS, like the autotools build does. Yep ... probably because as you mentioned above, this was primarily for VisualStudio and Linux as clients. Thanks for your responses, Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4136 bytes Desc: not available URL: From xinliangli at gmail.com Thu Jul 25 09:22:43 2013 From: xinliangli at gmail.com (Xinliang David Li) Date: Thu, 25 Jul 2013 09:22:43 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> Message-ID: It seems there is a desire to 'overload' the callback interfaces for very different purposes. IMVHO, there should be different categories of callbacks, and diagnostic callbacks should have their own. The design principle/tradeoffs of different categories can be quite different. My 2 cents. thanks, David On Wed, Jul 24, 2013 at 10:23 PM, Chris Lattner wrote: > On Jul 24, 2013, at 10:16 PM, David Blaikie wrote: >>> How about this: keep the jist of the current API, but drop the "warning"- or >>> "error"-ness of the API. Instead, the backend just includes an enum value >>> (plus string message for extra data). The frontend makes the decision of >>> how to render the diagnostic (or not, dropping them is fine) along with how >>> to map them onto warning/error or whatever concepts they use. >> >> I'm not quite clear from your suggestion whether you're suggesting the >> backend would produce a complete diagnostic string, or just the >> parameters - requiring/leaving it up to the frontend to have a full >> textual string for each backend diagnostic with the right number of >> placeholders, etc. I'm sort of in two minds about that - I like the >> idea that frontends keep all the user-rendered text (means >> localization issues are in one place, the frontend - rather than >> ending up with english text backend diagnostics rendered in a >> non-english client (yeah, pretty hypothetical, I'm not sure anyone has >> localized uses of LLVM)). But this does mean that there's no "free >> ride" - frontends must have some explicit handling of each backend >> diagnostic (some crappy worst-case fallback, but it won't be a useful >> message). > > I don't have a specific proposal in mind, other than thinking along the exact same lines as you above. :) > > The best approach is probably hybrid: the diagnostic producer can produce *both* a full string like today, as well as an "ID + enum" pair. This way, clang can use the later, but llc (as one example of something we want to keep simple) could print the former, and frontends that get unknown enums could fall back on the full string. > >> & I don't think this avoids the desire to have non-diagnostic >> callbacks whenever possible (notify of interesting things, frontends >> can decide whether to use that information to emit a diagnostic based >> on some criteria or behave differently in another way). > > Sure, but we also don't want to block progress in some area because we have a desire to solve a bigger problem. > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From dblaikie at gmail.com Thu Jul 25 09:25:06 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 09:25:06 -0700 Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: <1374768437655-59798.post@n5.nabble.com> References: <1374768437655-59798.post@n5.nabble.com> Message-ID: On Thu, Jul 25, 2013 at 9:07 AM, Abhinash Jain wrote: > Hi All, > > On my llvm pass I have some variable named "expr" which is being declared as > :- > string expr; // or char *expr; // I don't understand this question - these are C++ declarations, not LLVM declarations (not to mention LLVM IR doesn't have "variables", as such). So what's the actual LLVM entity you're dealing with? > Now I want to pass this "expr" to some external function. > How can I do this?? Have you tried compiling some code with Clang to see what IR it produces for different types, calls, and passing? > > Similarly, How can I pass variable "var" to an external function which is > being decalred as :- > Vector var; > > Any help will be appreciated.............. > > > > -- > View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From cdavis5x at gmail.com Thu Jul 25 10:07:44 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Thu, 25 Jul 2013 11:07:44 -0600 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: <6E4662D7-3572-4D74-9EF7-68C9319B430B@apple.com> References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> <6E4662D7-3572-4D74-9EF7-68C9319B430B@apple.com> Message-ID: <0174F2DF-9592-46A9-A7E7-6A35E2877ED1@gmail.com> On Jul 25, 2013, at 10:21 AM, Jeremy Huddleston Sequoia wrote: > > On Jul 24, 2013, at 14:18, Charles Davis wrote: > >> >> On Jul 24, 2013, at 11:11 AM, Jeremy Huddleston Sequoia wrote: >> >>> I recently took a stab at changing the MacPorts llvm-3.4 port from the configure-based build system to the cmake-based build system. >>> >>> There are a couple of issues that I still haven't been able to work out yet and would like to know if these are just configuration issues on my side or bugs I should file at bugs.llvm.org: >>> >>> >> tl;dr: LLVM CMake support is primarily designed for Windows/Visual Studio (especially in LLVM proper) and Linux (especially in compiler-rt), and needs lots of work to work well on Mac OS X or anywhere else. In particular, it is missing many features that are present in the autotools build. (Though, as the CMake proponents 'round here are quick to point out, the autotools system is itself missing some features that are present in CMake.) > > The main reason for this was because it looked like autoconf was the second rate citizen: > 1) only cmake supported building building the ubsan runtime for a while: http://llvm.org/bugs/show_bug.cgi?id=14341 > 2) some guides only talk about building with cmake (eg http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html) > 3) etc... > > If 2-systems is the long term goal, then I'll stick with what works (but still file bugs if cmake folks want to track them). If autoconf is going away at some point, I'd prefer to make the transition while we still have the fallback rather than have a period without it being available. About the only thing I can gather that we've been able to agree on is that we do want to get rid of "autohell" (as some people around here like to call it :) at some point. In fact, AFAICT we're just keeping it around because some internal Apple build processes (cf. utils/buildit) still need it. Most people around here just use CMake now--especially because then, they can use Ninja to build LLVM--but I seem to recall (and a lot of people seem to have forgotten) Daniel Dunbar talking about replacing both autoconf and CMake with some custom build system that he was going to write in Python; the LLVMBuild.txt stuff was supposed to be the first phase of that. (Then a bunch of people, mostly from *BSD, objected because they didn't want Python as a build-time dependency, even though it already is now. They seem to be doing fine with LLVM despite that.) I also remember Daniel saying that he didn't really like CMake, not least because (at least, at the time he said it) its Xcode support sucked horribly; but at this point, I'm not sure he's ever going to get around to finishing this new build system of his. When push finally comes to shove, we'll probably end up with a CMake-based build system (unless Daniel or someone else pipes up in the meantime about it). > >> ... > >> The CMake support in compiler-rt evolved in a completely different direction from the Makefiles; it was primarily designed originally, IIRC, to support building asan (and later, the other sanitizers), and mostly on Linux at that. Other platforms and configurations were an afterthought. It needs serious work--in particular, the runtime libraries are built with the host compiler (not usually a problem, unless you're building a cross compiler), and (as you've probably noticed by now) it doesn't make fat archives. Patches welcome if you can speak CMake ;). > > Ok. Unfortunately, I speak m4 way better than cmake. I've used it a few times, but I don't have much experience with it. Unfortunately, I don't have many CPU cycles of my own to devote to fixing this (or else, I might have fixed this already). @LLVMdev: any volunteers? > >>> ... >> Well that's odd. I have CMake from trunk installed, and I was able to use it to build a very simple project with one C source universal. I was also able to build CMake itself universal. There's no bug in CMake--at least, not anymore. There might, however, be a bug in LLVM's build system that's causing this. > > Ok, I'll try to figure this one out when I get some cycles. I can probably trace where that goes wrong. FWIW, I have cmake 2.8.10.2. I tried v2.8.10.2 on this really small test project that I've attached (the same one I used to verify that CMake HEAD works), and it builds fat just fine. So yeah, there's almost definitely a problem with LLVM CMake and universal builds. Or, maybe there's a local problem with your CMake. (Not likely, but still...) Try building the attached (after un-tar'ing) like so: $ cd test.cm $ cmake -DCMAKE_OSX_ARCHITECTURES="i386;x86_64" -G"Unix Makefiles" $ make and then you can run lipo -info on the "test" file that should've been built. It should report that there are two architectures in the file, i386 and x86_64. > > Thanks for your responses, No problem. Chip > Jeremy > -------------- next part -------------- A non-text attachment was scrubbed... Name: test.tar.bz2 Type: application/x-bzip2 Size: 542 bytes Desc: not available URL: From omnia at mailinator.com Thu Jul 25 10:12:19 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Thu, 25 Jul 2013 10:12:19 -0700 (PDT) Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: References: <1374768437655-59798.post@n5.nabble.com> Message-ID: <1374772339718-59803.post@n5.nabble.com> I did some computation through llvm pass, and store those computed values on string. eg. :- stringstream lhs; lhs << instr->getOperand(1); // 'instr' is some instruction string lhsvar=lhs.str(); Now I want to pass this 'lhsvar' to the external function, so how can i do this??? This is just the part of a code to make you understand. if you say I can even provide the link to the code. -- View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59803.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From dblaikie at gmail.com Thu Jul 25 10:36:32 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 10:36:32 -0700 Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: <1374772339718-59803.post@n5.nabble.com> References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> Message-ID: On Thu, Jul 25, 2013 at 10:12 AM, Abhinash Jain wrote: > I did some computation through llvm pass, and store those computed values on > string. eg. :- > > stringstream lhs; > lhs << instr->getOperand(1); // 'instr' is some instruction > string lhsvar=lhs.str(); > > Now I want to pass this 'lhsvar' to the external function, so how can i do > this??? Why are you storing this in a string at all - just use the operand itself, it will be the LLVM Value you want to pass to the function. > > This is just the part of a code to make you understand. if you say I can > even provide the link to the code. > > > > > > > > -- > View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59803.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From echristo at gmail.com Thu Jul 25 10:46:41 2013 From: echristo at gmail.com (Eric Christopher) Date: Thu, 25 Jul 2013 10:46:41 -0700 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: References: <6E6EDD16-5F53-46F0-A8C9-9EF01247EBA0@apple.com> Message-ID: Haven't looked at them, but the idea is preapproved and ... if you break it you buy it ;) -eric On Thu, Jul 25, 2013 at 7:09 AM, Rafael Espíndola wrote: >> I say that we drop it. If someone steps up to start maintaining it, they can begin by resurrecting it from SVN. > > Patches attached. > > Cheers, > Rafael From shankare at codeaurora.org Thu Jul 25 10:54:23 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Thu, 25 Jul 2013 12:54:23 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections Message-ID: <51F1664F.1040003@codeaurora.org> Hi, Currently lld ties up all atoms in a section for ELF together. This proposal just breaks it by handling it differently. *This requires **NO ELF ABI changes. *_*Definitions :-*_ A section is not considered safe if there is some code that appears to be present between function boundaries (or) optimizes sections to place data at the end or beginning of a section (that contains no symbol). A section is considered safe if symbols contained within the section have been associated with their appropriate sizes and there is no data present between function boundaries. Examples of safe sections are, code generated by compilers. Examples of unsafe sections are, hand written assembly code. _*Changes Needed :-*_ The change that I am trying to propose is the compiler emits a section, called (*.safe_sections) *that contains section indices on what sections are safe. The section would have a SHF_EXCLUDE flag, to prevent other linkers from consuming this section and making it to the output file. Data structure for this :- .safe_sections
... ... *_Advantages_ *There are advantages that the atoms within a safe section could just be allocated in the output file which means better output file layout, and Better performance! This would also result in more atoms getting gc'ed. a) looking at profile information b) taking a order file *_Changes needed in the assembler_ *a) add an additional flag in the section for people writing assembly code, to mark a section safe or unsafe. * **_Changes needed in lld_ *a) Read the safe section if its present in the object file b) Tie atoms together within a section if the section is not safe * *Thanks Shankar Easwaran* * -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnia at mailinator.com Thu Jul 25 11:07:09 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Thu, 25 Jul 2013 11:07:09 -0700 (PDT) Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> Message-ID: <1374775629112-59807.post@n5.nabble.com> Thanx for the response. %x = alloca i32, align 4 %y = alloca i32, align 4 %a = alloca i32, align 4 %t = alloca i32, align 4 1. %10 = load i32* %x, align 4 2. %11 = load i32* %y, align 4 3. %div = sdiv i32 %10, %11 4. %12 = load i32* %a, align 4 5. %mul4 = mul nsw i32 %div, %12 6. store i32 %mul4, i32* %t, align 4 a. %mul4 = mul nsw i32 %div, %12 b. %div = sdiv i32 %10, %11 c. %10 = load i32* %x, align 4 d. %11 = load i32* %y, align 4 e. %12 = load i32* %a, align 4 This is a IR of " t = x/y*a " The address of the "%x, %y, %a" are "0xbe4ab94, 0xbe4abd4, 0xbe4ac54" respectively. and the opcode of "*, /" are "12, 15" respectively. Point (6.a to 6.e) are the result of use-def chain, with the help of which I managed to make a string as - "12_15_0xbe4ab94_0xbe4abd4_0xbe4ac54" (by converting getOpcode() and getOperand() to string and than concatenating it) Now I want to pass this whole string to some other external function, How can I do this? If you want I can show you the full code. -- View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59807.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From echristo at gmail.com Thu Jul 25 11:11:49 2013 From: echristo at gmail.com (Eric Christopher) Date: Thu, 25 Jul 2013 11:11:49 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51F0EB8B.10406@mxc.ca> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> <51F0EB8B.10406@mxc.ca> Message-ID: >> Regarding auditing the intrinsics – I’d prefer to do this in stages. > > > Sounds fine. I'd like Eric to take a quick look and agree that marking debug > intrinsics speculatable is sane. (Yes they already are, but that doesn't > mean it's sane. I also want Eric to know that 'speculatable' is going to > start showing up in his debug info.) > Sure, seems reasonable so far. -eric > One other thing, your patch and Tobias Grosser's patch (subject "Make .bc > en/decoding of AttrKind stable") are in conflict. Whoever lands second will > need to merge. > > Nick > >> Here I’m just preserving the current behavior by marking intrinsics that >> used to be explicitly handled in isSafeToSpeculativelyExecute(), so >> there should be no functional change. >> >> *From:*Nick Lewycky [mailto:nlewycky at google.com] >> *Sent:* Tuesday, July 23, 2013 02:29 >> *To:* Kuperstein, Michael M >> *Cc:* Nick Lewycky; llvmdev at cs.uiuc.edu >> *Subject:* Re: [LLVMdev] Does nounwind have semantics? >> >> >> On 22 July 2013 01:11, Kuperstein, Michael M >> > >> >> wrote: >> >> Of course frontends are free to put attributes, but it would be nice >> if optimizations actually used them. ;-) >> My use case is that of proprietary frontend that happens to know >> some library function calls - which are only resolved at link time - >> have no side effects and are safe to execute speculatively, and >> wants to tell the optimizer it can move them around however it >> likes. I'll gladly submit a patch that uses these hints, but I'd >> like to reach some consensus on what the desired attributes actually >> are first. The last thing I want is to add attributes that are only >> useful to myself. >> >> Regarding having several orthogonal attributes vs. things like >> "safetospeculate": >> >> To know a function is safe to speculatively execute, I need at least: >> 1) readnone (readonly is insufficient, unless I know all accessed >> pointers are valid) >> 2) nounwind >> 3) nolongjmp (I guess?) >> 4) no undefined behavior. This includes things like "halting" and >> "no division by zero", but that's not, by far, an exhaustive list. >> >> I guess there are several ways to handle (4). >> Ideally, I agree with you, we'd like a set of orthogonal attributes >> that, taken together, imply that the function's behavior is not >> undefined. >> But that requires mapping all sources of undefined behavior (I don't >> think this is currently documented for LLVM IR, at least not in a >> centralized fashion) and adding a very specific attribute for each >> of them. I'm not sure having function declarations with "readnone, >> nounwind, nolongjmp, halting, nodivbyzero, nopoisonval, >> nocomparelabels, nounreachable, ..." is desirable. >> >> We could also have a "welldefined" attribute and a "halting" >> attribute where "welldefined" subsumes "halting", if the specific >> case of a function which halts but may have undefined behavior is >> important. >> While the two are not orthogonal, it's similar to the situation with >> "readnone" and "readonly". Does that sound reasonable? >> >> You're entirely right. I forgot about undefined behaviour. >> >> If you want a 'speculatable' attribute, I would review that patch. >> Please audit the intrinsics (at least the target-independent ones) and >> appropriate library functions for whether you can apply this attribute >> to them. I think the only optimization that it can trigger is that >> "isSafeToSpeculativelyExecute" returns true on it. Anything else? Is it >> safe to infer readnone and nounwind from speculatable? >> >> I should mention that speculatable functions are extraordinarily limited >> in what they can do in the general (non–LLVM-as-a-library) case. They >> may be hoisted above calls to fork or pthread_create, they may be moved >> into global constructors (and thus can't depend on global state), etc. >> However, since you have a specific library you want to generate code >> against, you have the power to make use of it. I don't expect clang or >> dragonegg to be able to make use of it. >> >> Nick >> >> -----Original Message----- >> From: Nick Lewycky [mailto:nicholas at mxc.ca ] >> Sent: Monday, July 22, 2013 10:24 >> To: Kuperstein, Michael M >> Cc: Andrew Trick; llvmdev at cs.uiuc.edu >> Subject: Re: [LLVMdev] Does nounwind have semantics? >> >> Kuperstein, Michael M wrote: >> > I'm not sure I understand why it's blocked on that, by the way. >> >> It blocks our ability to automatically deduce the halting attribute >> in the optimizer, which was necessary for the use case I had at the >> time. >> If you have a use case of your own, feel free to propose the patch! >> >> (Technically it's not *blocked* -- see how my patch does it! -- but >> the workarounds are too horrible to be committed.) >> >> > Even if we can't apply the attribute ourselves, I don't see why >> we wouldn't expose that ability to frontends. >> >> Frontends are free to put attributes on functions if they want to. >> Go for it! >> >> > I'm not entirely sure "halting" is the right attribute either, by >> the way. >> > What I, personally, would like to see is a way to specify a >> function call is safe to speculatively execute. That implies >> readnone (not just readonly), nounwind, halting - and Eris knows >> what else. Nick, is that too strong for you? >> >> I strongly prefer the approach of having orthogonal attributes. >> There are optimizations that you can do with each of these >> attributes on their own. In particular I think that >> readonly+halting+nounwind+nolongjmp is going to be common and I'd >> feel silly if we had a special case for >> readnone+halting+nounwind+nolongjmp and thus couldn't optimize the >> more >> common case. >> >> That said, I'm also going to feel silly if we don't end up with enough >> attributes to allow isSafeToSpeculate to deduce it, which is where we >> are right now. I was planning to get back to fixing this after >> Chandler's promised PassManager work. >> >> Nick >> >> > >> > Michael >> > >> > -----Original Message----- >> > From: llvmdev-bounces at cs.uiuc.edu >> >> [mailto:llvmdev-bounces at cs.uiuc.edu >> ] On Behalf Of Nick Lewycky >> > Sent: Monday, July 22, 2013 07:08 >> > To: Andrew Trick >> > Cc: llvmdev at cs.uiuc.edu >> > Subject: Re: [LLVMdev] Does nounwind have semantics? >> > >> > Andrew Trick wrote: >> >> Does 'nounwind' have semantics that inform optimization passes? >> It seems to in some cases, but not consistently. For example... >> >> >> >> int32_t foo(int32_t* ptr) { >> >> int i = 0; >> >> int result; >> >> do { >> >> bar(ptr); >> >> result = *ptr; >> >> bar(ptr); >> >> } while (i++< *ptr); >> >> return result; >> >> } >> >> >> >> Say we have a front end that declares bar as... >> >> >> >> declare void @bar(i32*) readonly; >> >> >> >> So 'bar' is 'readonly' and 'may-unwind'. >> >> >> >> When LICM tries to hoist the load it interprets the 'may-unwind' >> as "MayThrow" in LICM-language and bails. However, when it tries to >> sink the call itself it sees the 'readonly', assumes no side effects >> and sinks it below the loads. Hmm... >> >> >> >> There doesn't appear to be a way to declare a function that is >> guaranteed not to write to memory in a way that affects the caller, >> but may have another well-defined side effect like aborting the >> program. This is interesting, because that is the way runtime checks >> for safe languages would like to be defined. I'm perfectly happy >> telling front ends to generate control flow for well-defined traps, >> since I like lots of basic blocks in my IR. But I'm still curious >> how others deal with this. >> > >> > Yes, we went through a phase where people would try to use >> "nounwind+readonly == no side-effects" to optimize. All such >> optimizations are wrong. Unless otherwise proven, a function may >> inf-loop, terminate the program, or longjmp. >> > >> > I tried to add 'halting' to help solve part of this a long time >> ago, but it never went in. The problem is that determining whether >> you have loops requires a FunctionPass (LoopInfo to find loops and >> SCEV to determine an upper bound) and applying function attributes >> is an SCC operation (indeed, an SCC is itself a loop), so it's all >> blocked behind fixing the PassManager to allow CGSGGPasses to depend >> on FunctionPasses. >> > >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100705/103670.html >> > >> > I'm now in a similar situation where I want 'nounwind' to mean >> "only exits by terminating the program or a return instruction" but >> unfortunately functions which longjmp are considered nounwind. I >> would like to change llvm to make longjmp'ing a form of unwinding >> (an exceptional exit to the function), but if I were to apply that >> rule today then we'd start putting dwarf eh tables on all our C >> code, oops. >> > >> > Nick >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu >> >> http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> --------------------------------------------------------------------- >> > Intel Israel (74) Limited >> > >> > This e-mail and any attachments may contain confidential material >> for >> > the sole use of the intended recipient(s). Any review or >> distribution >> > by others is strictly prohibited. If you are not the intended >> > recipient, please contact the sender and delete all copies. >> > >> > >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> > From tobias at grosser.es Thu Jul 25 11:13:50 2013 From: tobias at grosser.es (Tobias Grosser) Date: Thu, 25 Jul 2013 11:13:50 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <251BD6D4E6A77E4586B482B33960D2283361BCA1@HASMSX106.ger.corp.intel.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> <51F0EB8B.10406@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283361BCA1@HASMSX106.ger.corp.intel.com> Message-ID: <51F16ADE.7020404@grosser.es> On 07/25/2013 02:14 AM, Kuperstein, Michael M wrote: > Right, will fix the CallInst, copy/pasted from a different case and didn't notice what I was doing, thanks. > > Re LangRef.rst, that's what I meant, I'm still hoping for better suggestions regarding the name... > > As to the conflict - Tobias, feel free to go first, I'll merge. I am still waiting for a review. The merge will be trivial, so just commit whenever you are ready. Tobi From dblaikie at gmail.com Thu Jul 25 11:16:46 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 11:16:46 -0700 Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: <1374775629112-59807.post@n5.nabble.com> References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> <1374775629112-59807.post@n5.nabble.com> Message-ID: On Thu, Jul 25, 2013 at 11:07 AM, Abhinash Jain wrote: > Thanx for the response. > > %x = alloca i32, align 4 > %y = alloca i32, align 4 > %a = alloca i32, align 4 > %t = alloca i32, align 4 > > 1. %10 = load i32* %x, align 4 > 2. %11 = load i32* %y, align 4 > 3. %div = sdiv i32 %10, %11 > 4. %12 = load i32* %a, align 4 > 5. %mul4 = mul nsw i32 %div, %12 > > 6. store i32 %mul4, i32* %t, align 4 > a. %mul4 = mul nsw i32 %div, %12 > b. %div = sdiv i32 %10, %11 > c. %10 = load i32* %x, align 4 > d. %11 = load i32* %y, align 4 > e. %12 = load i32* %a, align 4 > > This is a IR of " t = x/y*a " > The address of the "%x, %y, %a" are "0xbe4ab94, 0xbe4abd4, 0xbe4ac54" > respectively. > and the opcode of "*, /" are "12, 15" respectively. > > Point (6.a to 6.e) are the result of use-def chain, with the help of which > I managed to make a string as - > > "12_15_0xbe4ab94_0xbe4abd4_0xbe4ac54" (by converting getOpcode() and > getOperand() to string and than concatenating it) > > Now I want to pass this whole string to some other external function, How > can I do this? It's still rather unclear what you're attempting to achieve - are you trying to pass this string to something else within the compiler? If so, you'd pass a string the same way you'd pass any other string in C++ (well, in LLVM: void func(StringRef Str); ... std::string foo = "12_15_...."; func(foo); If you're trying to emit this string into the compiled binary & produce a function call in that program... that's another story, but again - looking at simple examples of code compiled by Clang should give you an idea of what instructions you need to emit. > > If you want I can show you the full code. > > > > -- > View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59807.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From clattner at apple.com Thu Jul 25 11:22:57 2013 From: clattner at apple.com (Chris Lattner) Date: Thu, 25 Jul 2013 11:22:57 -0700 Subject: [LLVMdev] Deprecating and removing the MBlaze backend In-Reply-To: References: <6E6EDD16-5F53-46F0-A8C9-9EF01247EBA0@apple.com> Message-ID: On Jul 25, 2013, at 7:09 AM, Rafael Espíndola wrote: >> I say that we drop it. If someone steps up to start maintaining it, they can begin by resurrecting it from SVN. > > Patches attached. Do it! -Chris > > Cheers, > Rafael > From nicholas at mxc.ca Thu Jul 25 11:36:25 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Thu, 25 Jul 2013 11:36:25 -0700 Subject: [LLVMdev] Error for AllocaInst and Instruction In-Reply-To: References: Message-ID: <51F17029.4080503@mxc.ca> Rasha Omar wrote: > Hi, > For the following code > const Type * Int32Type = > IntegerType::getInt32Ty(getGlobalContext()); Don't use const types any more. Since llvm 3.0, nothing takes a const Type*. Nick > AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", Bb); > Bb->getInstList().push_back(newInst); > > It gives me the error > " error: no matching constructor for initialization of 'llvm::AllocaInst' > AllocaInst* newInst = new AllocaInst(Int32Type, 0, "flag", > Bb);" > > > By using Instruction > > const Type * Int32Type = > IntegerType::getInt32Ty(getGlobalContext()); > Instruction* newInst = new Instruction(Int32Type, 0, "flag", Bb); > Bb->getInstList().push_back(newInst); > > > error: allocating an object of abstract class type 'llvm::Instruction' > Instruction* newInst = new Instruction(Int32Type, 0, > "flag", Bb); > > > -- > *Rasha Salah Omar > Msc Student at E-JUST > Demonestrator at Faculty of Computers and Informatics > Benha University > * > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From omnia at mailinator.com Thu Jul 25 11:40:59 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Thu, 25 Jul 2013 11:40:59 -0700 (PDT) Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> <1374775629112-59807.post@n5.nabble.com> Message-ID: <1374777659079-59812.post@n5.nabble.com> I have one file named hashtable.cpp whose link is "http://pastebin.com/Cq2Qy50C" and one llvm pass named testing.cpp whose link is "http://pastebin.com/E3RemxLF" Now on this testing.cpp pass I have computed the string named "expr" which I want to pass to the function named hashtable(string) in hashtable.cpp (on line 106 of testing.cpp) > looking at simple examples of code compiled by Clang should give you an > idea of what instructions you need to emit. I did try but not able to pass the string. If you could go through my code it will be of great help. Thanx -- View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59812.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From justin.holewinski at gmail.com Thu Jul 25 12:33:05 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Thu, 25 Jul 2013 15:33:05 -0400 Subject: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? Message-ID: Looking through recent additions, it looks like the infrastructure exists for targets to specify a per-address-space pointer size/type, but this does not seem to be used anywhere (SelectionDAGBuilder, legalizer, etc.). What is the status of this support? Is anyone actively working on it? -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From arsenm2 at gmail.com Thu Jul 25 12:41:08 2013 From: arsenm2 at gmail.com (Matt Arsenault) Date: Thu, 25 Jul 2013 12:41:08 -0700 Subject: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? In-Reply-To: References: Message-ID: <2D2A1070-5DA8-4A4F-8E32-2557E8AE486C@gmail.com> On Jul 25, 2013, at 12:33 , Justin Holewinski wrote: > Looking through recent additions, it looks like the infrastructure exists for targets to specify a per-address-space pointer size/type, but this does not seem to be used anywhere (SelectionDAGBuilder, legalizer, etc.). What is the status of this support? Is anyone actively working on it? > I'm actively working on this. I think I have most of the target independent parts ready that I'm gradually sending through review. I haven't looked at what needs to be done in SelectionDAG yet. From justin.holewinski at gmail.com Thu Jul 25 12:58:56 2013 From: justin.holewinski at gmail.com (Justin Holewinski) Date: Thu, 25 Jul 2013 15:58:56 -0400 Subject: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? In-Reply-To: <2D2A1070-5DA8-4A4F-8E32-2557E8AE486C@gmail.com> References: <2D2A1070-5DA8-4A4F-8E32-2557E8AE486C@gmail.com> Message-ID: Awesome! What will the requirements be for the target? Is it sufficient to just override getPointerTy and add appropriate data layout strings, or will more hooks be needed? On Thu, Jul 25, 2013 at 3:41 PM, Matt Arsenault wrote: > > On Jul 25, 2013, at 12:33 , Justin Holewinski > wrote: > > > Looking through recent additions, it looks like the infrastructure > exists for targets to specify a per-address-space pointer size/type, but > this does not seem to be used anywhere (SelectionDAGBuilder, legalizer, > etc.). What is the status of this support? Is anyone actively working on > it? > > > > I'm actively working on this. I think I have most of the target > independent parts ready that I'm gradually sending through review. I > haven't looked at what needs to be done in SelectionDAG yet. -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Thu Jul 25 13:05:47 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 13:05:47 -0700 Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: <1374777659079-59812.post@n5.nabble.com> References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> <1374775629112-59807.post@n5.nabble.com> <1374777659079-59812.post@n5.nabble.com> Message-ID: On Thu, Jul 25, 2013 at 11:40 AM, Abhinash Jain wrote: > I have one file named hashtable.cpp whose link is > "http://pastebin.com/Cq2Qy50C" > > and one llvm pass named testing.cpp whose link is > "http://pastebin.com/E3RemxLF" > > Now on this testing.cpp pass I have computed the string named "expr" which I > want to pass to the function named hashtable(string) in hashtable.cpp (on > line 106 of testing.cpp) > > >> looking at simple examples of code compiled by Clang should give you an >> idea of what instructions you need to emit. > > I did try but not able to pass the string. > > If you could go through my code it will be of great help. Thanx OK - seems you might want to take a few steps back & understand how C++ code is written/structured generally (and/or take a look at other parts of the compiler). You'll need a header file with the declaration of your function & you can include that header file in the hashtable.cpp and testing.cpp - if that sentence doesn't make sense to you yet, please find some general C++ programming resources to provide further detail as such a discussion isn't really on-topic here. - David From omnia at mailinator.com Thu Jul 25 13:45:52 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Thu, 25 Jul 2013 13:45:52 -0700 (PDT) Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> <1374775629112-59807.post@n5.nabble.com> <1374777659079-59812.post@n5.nabble.com> Message-ID: <1374785152895-59818.post@n5.nabble.com> > OK - seems you might want to take a few steps back & understand how > C++ code is written/structured generally (and/or take a look at other > parts of the compiler). You'll need a header file with the declaration > of your function & you can include that header file in the > hashtable.cpp and testing.cpp - if that sentence doesn't make sense to > you yet, please find some general C++ programming resources to provide > further detail as such a discussion isn't really on-topic here. I know it could be a bit frustrating for you to answer such foolish question, but am novice to llvm and desperately need some help to resolve this issue. testing.cpp is an instrumentation file of llvm, whereas hashtable.cpp is simple c++ code. I want to insert the function named hashtable() (which is already present in hashtable.cpp) after every store instruction. So I have to first DECLARE it, which can be done with getOrInsertFunction() in testing.cpp. Now since declaration of function hashtable() is like void hashtable(string expr) //in hashtable.cpp So in getOrInsertFunction() I have to pass some more details like getOrInsertFunction (name of function in IR, Return type of function, List of argument for the function, etc.) i.e. getOrInsertFunction("_Z14printHashTablev", Type::getVoidTy(M.getContext()), ????, ????,); Now hashtable() function takes string as an argument, so what should I write in place of ???? above. And than how should I actually CALL that hashtable() function with the help of methods like CallInst::Create, getInstList().insert etc. -- View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59818.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From dblaikie at gmail.com Thu Jul 25 13:51:55 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 13:51:55 -0700 Subject: [LLVMdev] Passing String to an external function in llvm In-Reply-To: <1374785152895-59818.post@n5.nabble.com> References: <1374768437655-59798.post@n5.nabble.com> <1374772339718-59803.post@n5.nabble.com> <1374775629112-59807.post@n5.nabble.com> <1374777659079-59812.post@n5.nabble.com> <1374785152895-59818.post@n5.nabble.com> Message-ID: On Thu, Jul 25, 2013 at 1:45 PM, Abhinash Jain wrote: >> OK - seems you might want to take a few steps back & understand how >> C++ code is written/structured generally (and/or take a look at other >> parts of the compiler). You'll need a header file with the declaration >> of your function & you can include that header file in the >> hashtable.cpp and testing.cpp - if that sentence doesn't make sense to >> you yet, please find some general C++ programming resources to provide >> further detail as such a discussion isn't really on-topic here. > > I know it could be a bit frustrating for you to answer such foolish > question, but am novice to llvm and desperately need some help to resolve > this issue. > > testing.cpp is an instrumentation file of llvm, whereas hashtable.cpp is > simple c++ code. > > I want to insert the function named hashtable() (which is already present in > hashtable.cpp) after every store instruction. > So I have to first DECLARE it, which can be done with getOrInsertFunction() > in testing.cpp. > Now since declaration of function hashtable() is like > > void hashtable(string expr) //in hashtable.cpp > > So in getOrInsertFunction() I have to pass some more details like > > getOrInsertFunction (name of function in IR, Return type of function, List > of argument for the function, etc.) > i.e. getOrInsertFunction("_Z14printHashTablev", > Type::getVoidTy(M.getContext()), ????, ????,); I'd suggest you try looking at the IR that Clang generates for a call to this function & produce similar IR yourself. If you're having trouble figuring out which APIs to call to produce the same IR there is a "cpp" backend to LLVM that can spit out the LLVM C++ API calls to produce the same IR, I believe - I don't recall the specifics of how to invoke it, though. > > Now hashtable() function takes string as an argument, so what should I write > in place of ???? above. > > And than how should I actually CALL that hashtable() function with the help > of methods > like CallInst::Create, getInstList().insert etc. > > > > -- > View this message in context: http://llvm.1065342.n5.nabble.com/Passing-String-to-an-external-function-in-llvm-tp59798p59818.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ruiu at google.com Thu Jul 25 13:56:00 2013 From: ruiu at google.com (Rui Ueyama) Date: Thu, 25 Jul 2013 13:56:00 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F1664F.1040003@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> Message-ID: I think I share the goal with you to make the foundation for better dead-strip, so thank you for suggesting. I'm not sure if marking a section as a whole as "safe" or "unsafe" is the best approach, though. Some comments. - If the compiler generated code is always "safe", and if we can distinguish it from hand-written assembly code by checking if there's a gap between symbols, can we just assume a section with no gap is always "safe"? - "Safeness" is not an attribute of the section but of the symbol, I think. The symbol is "safe" if there's no direct reference to the symbol data. All references should go through relocations. A section may contain both "safe" and "unsafe" symbols. - How about making the compiler to create a new section for each "safe" atom, as it does for inline functions? On Thu, Jul 25, 2013 at 10:54 AM, Shankar Easwaran wrote: > Hi, > > Currently lld ties up all atoms in a section for ELF together. This > proposal just breaks it by handling it differently. > > *This requires **NO ELF ABI changes. > > **Definitions :-* > > A section is not considered safe if there is some code that appears to be > present between function boundaries (or) optimizes sections to place data > at the end or beginning of a section (that contains no symbol). > > A section is considered safe if symbols contained within the section have > been associated with their appropriate sizes and there is no data present > between function boundaries. > > Examples of safe sections are, code generated by compilers. > > Examples of unsafe sections are, hand written assembly code. > > *Changes Needed :-* > > The change that I am trying to propose is the compiler emits a section, > called (*.safe_sections) *that contains section indices on what sections > are safe. > > The section would have a SHF_EXCLUDE flag, to prevent other linkers from > consuming this section and making it to the output file. > > Data structure for this :- > > .safe_sections > >
> ... > ... > > > *Advantages > *There are advantages that the atoms within a safe section could just be > allocated in the output file which means better output file layout, and > Better performance! > > This would also result in more atoms getting gc'ed. > > a) looking at profile information > b) taking a order file > > *Changes needed in the assembler > > *a) add an additional flag in the section for people writing assembly > code, to mark a section safe or unsafe. > * > **Changes needed in lld > > *a) Read the safe section if its present in the object file > b) Tie atoms together within a section if the section is not safe > * > *Thanks > > Shankar Easwaran* > * > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shankare at codeaurora.org Thu Jul 25 14:01:05 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Thu, 25 Jul 2013 16:01:05 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> Message-ID: <51F19211.6050403@codeaurora.org> On 7/25/2013 3:56 PM, Rui Ueyama wrote: > I think I share the goal with you to make the foundation for better > dead-strip, so thank you for suggesting. I'm not sure if marking a section > as a whole as "safe" or "unsafe" is the best approach, though. Some > comments. > > - If the compiler generated code is always "safe", and if we can > distinguish it from hand-written assembly code by checking if there's a gap > between symbols, can we just assume a section with no gap is always "safe"? Gaps could just be caused due to alignment, but the code may be safe, which the compiler knows very well. > - "Safeness" is not an attribute of the section but of the symbol, I > think. The symbol is "safe" if there's no direct reference to the symbol > data. All references should go through relocations. A section may contain > both "safe" and "unsafe" symbols. Sections contain symbols. In the context of ELF, marking sections as safe/not is more desirable because of the switches (-ffunction-sections and -fdata-sections available already). > - How about making the compiler to create a new section for each "safe" > atom, as it does for inline functions? You already have a switch called -ffunction-sections and -fdata-sections to put function and data in seperate sections. > > > On Thu, Jul 25, 2013 at 10:54 AM, Shankar Easwaran > wrote: > >> Hi, >> >> Currently lld ties up all atoms in a section for ELF together. This >> proposal just breaks it by handling it differently. >> >> *This requires **NO ELF ABI changes. >> >> **Definitions :-* >> >> A section is not considered safe if there is some code that appears to be >> present between function boundaries (or) optimizes sections to place data >> at the end or beginning of a section (that contains no symbol). >> >> A section is considered safe if symbols contained within the section have >> been associated with their appropriate sizes and there is no data present >> between function boundaries. >> >> Examples of safe sections are, code generated by compilers. >> >> Examples of unsafe sections are, hand written assembly code. >> >> *Changes Needed :-* >> >> The change that I am trying to propose is the compiler emits a section, >> called (*.safe_sections) *that contains section indices on what sections >> are safe. >> >> The section would have a SHF_EXCLUDE flag, to prevent other linkers from >> consuming this section and making it to the output file. >> >> Data structure for this :- >> >> .safe_sections >> >>
>> ... >> ... >> >> >> *Advantages >> *There are advantages that the atoms within a safe section could just be >> allocated in the output file which means better output file layout, and >> Better performance! >> >> This would also result in more atoms getting gc'ed. >> >> a) looking at profile information >> b) taking a order file >> >> *Changes needed in the assembler >> >> *a) add an additional flag in the section for people writing assembly >> code, to mark a section safe or unsafe. >> * >> **Changes needed in lld >> >> *a) Read the safe section if its present in the object file >> b) Tie atoms together within a section if the section is not safe >> * >> *Thanks >> >> Shankar Easwaran* >> * >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation >> >> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From ruiu at google.com Thu Jul 25 14:10:04 2013 From: ruiu at google.com (Rui Ueyama) Date: Thu, 25 Jul 2013 14:10:04 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F19211.6050403@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> Message-ID: Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. On Thu, Jul 25, 2013 at 2:01 PM, Shankar Easwaran wrote: > On 7/25/2013 3:56 PM, Rui Ueyama wrote: > >> I think I share the goal with you to make the foundation for better >> dead-strip, so thank you for suggesting. I'm not sure if marking a section >> as a whole as "safe" or "unsafe" is the best approach, though. Some >> comments. >> >> - If the compiler generated code is always "safe", and if we can >> distinguish it from hand-written assembly code by checking if there's a >> gap >> between symbols, can we just assume a section with no gap is always >> "safe"? >> > Gaps could just be caused due to alignment, but the code may be safe, > which the compiler knows very well. > > > - "Safeness" is not an attribute of the section but of the symbol, I >> think. The symbol is "safe" if there's no direct reference to the symbol >> data. All references should go through relocations. A section may contain >> both "safe" and "unsafe" symbols. >> > Sections contain symbols. In the context of ELF, marking sections as > safe/not is more desirable because of the switches (-ffunction-sections and > -fdata-sections available already). > > > - How about making the compiler to create a new section for each "safe" >> atom, as it does for inline functions? >> > You already have a switch called -ffunction-sections and -fdata-sections > to put function and data in seperate sections. > > >> >> On Thu, Jul 25, 2013 at 10:54 AM, Shankar Easwaran >> **wrote: >> >> Hi, >>> >>> Currently lld ties up all atoms in a section for ELF together. This >>> proposal just breaks it by handling it differently. >>> >>> *This requires **NO ELF ABI changes. >>> >>> **Definitions :-* >>> >>> >>> A section is not considered safe if there is some code that appears to be >>> present between function boundaries (or) optimizes sections to place data >>> at the end or beginning of a section (that contains no symbol). >>> >>> A section is considered safe if symbols contained within the section have >>> been associated with their appropriate sizes and there is no data present >>> between function boundaries. >>> >>> Examples of safe sections are, code generated by compilers. >>> >>> Examples of unsafe sections are, hand written assembly code. >>> >>> *Changes Needed :-* >>> >>> >>> The change that I am trying to propose is the compiler emits a section, >>> called (*.safe_sections) *that contains section indices on what sections >>> >>> are safe. >>> >>> The section would have a SHF_EXCLUDE flag, to prevent other linkers from >>> consuming this section and making it to the output file. >>> >>> Data structure for this :- >>> >>> .safe_sections >>> >>>
>>> ... >>> ... >>> >>> >>> *Advantages >>> *There are advantages that the atoms within a safe section could just be >>> >>> allocated in the output file which means better output file layout, and >>> Better performance! >>> >>> This would also result in more atoms getting gc'ed. >>> >>> a) looking at profile information >>> b) taking a order file >>> >>> *Changes needed in the assembler >>> >>> *a) add an additional flag in the section for people writing assembly >>> >>> code, to mark a section safe or unsafe. >>> * >>> **Changes needed in lld >>> >>> *a) Read the safe section if its present in the object file >>> >>> b) Tie atoms together within a section if the section is not safe >>> * >>> *Thanks >>> >>> Shankar Easwaran* >>> >>> * >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>> hosted by the Linux Foundation >>> >>> >>> > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qcolombet at apple.com Fri Jul 26 09:18:04 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Fri, 26 Jul 2013 09:18:04 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <28838467-2769-4E2B-BF7B-01DDCADA8F54@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Message-ID: Hi, On Jul 25, 2013, at 9:07 PM, Chris Lattner wrote: > On Jul 25, 2013, at 6:04 PM, David Blaikie wrote: >>>> void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) >>> >>> The idea is that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum. Sounds good! >> >> Presumably there are diagnostics that are going to have more than one >> parameter, though. ArrayRef? (though this doesn't scale to >> providing fancy things like DIVariable parameters, that would possibly >> allow Clang to lookup the variable (or function, etc) & print out >> cursors, etc) > > It's possible, but the full generality of any diagnostic can be folded (at some loss of generality) into a single string. To balance complexity and generality vs simplicity, "1" is a pretty decent number. Agreed. Thanks for the help. Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Fri Jul 26 08:33:25 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 26 Jul 2013 10:33:25 -0500 (CDT) Subject: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk In-Reply-To: <668642398.8242115.1372268637416.JavaMail.root@alcf.anl.gov> Message-ID: <82364744.14764035.1374852805819.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > ----- Original Message ----- > > Great job, Hal! > > > > Sure. I'd be happy to run icFuzz and report the fails once these > > bugs > > are fixed and thereafter whenever people want new runs. Obviously, > > this can be automated, but the problem is that icFuzz is not > > currently open sourced. > > I would be happy to see this open sourced, but I think that we can > work something out regardless. > > Also, once we get the current set of things resolved, I think it > would be useful to test running with: > > -- -O3, LTO (-O4 or -flto), > -- -fslp-vectorize, -fslp-vectorize-aggressive (which are actually > separate optimizations) > -- -ffast-math (if you can do floating point with tolerances, or at > least -ffinite-math-only), -fno-math-errno > (and there are obviously a whole bunch of non-default > code-generation and target options). > > Is it feasible to set up runs with different flags? > > > Once there's a bug in the compiler, there's > > really no limit in the number of failing tests that can be > > generated, so it's more productive to run the generator after the > > previously reported bugs are fixed. > > Agreed. > > > > > We've also seen cases where the results of "clang -O2" are > > different > > on Mac vs. Linux/Windows. > > I recall an issue related to default settings for FP, and differences > with libm implementation. Are there non-floating-point cases? > > > > > Just let me know when you want a new run. > > Will do! Mohammad, Can you please re-run these now? I know that the original loop-vectorizer bugs causing the miscompiles have been fixed, and the others also seem to have been resolved as well. Thanks again, Hal > > -Hal > > > > > Cheers, > > -moh > > > > -----Original Message----- > > From: Hal Finkel [mailto:hfinkel at anl.gov] > > Sent: Wednesday, June 26, 2013 7:35 AM > > To: Haghighat, Mohammad R > > Cc: llvmdev at cs.uiuc.edu; Jim Grosbach > > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly > > generated tests that fail on clang 3.4 trunk > > > > ----- Original Message ----- > > > > > > Hi Moh, > > > > > > > > > Thanks for this. I’m really glad to see the work you’re doing in > > > this > > > area and believe it will be extremely helpful in improving the > > > quality of the compiler. > > > > > > > > > -Jim > > > > > > > > > > > > On Jun 24, 2013, at 4:10 PM, Haghighat, Mohammad R < > > > mohammad.r.haghighat at intel.com > wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > I just submitted a bug report with a package containing 107 small > > > test cases that fail on the latest LLVM/clang 3.4 main trunk > > > (184563). Included are test sources, compilation commands, test > > > input files, and results at –O0 and –O2 when applicable. > > > > > > http://llvm.org/bugs/show_bug.cgi?id=16431 > > > > > > These tests have been automatically generated by an internal tool > > > at > > > Intel, the Intel Compiler fuzzer, icFuzz. The tests are typically > > > very small. For example, for the following simple loop (test > > > t5702) > > > on MacOS X, clang at –O2 generates a binary that crashes: > > > > > > // Test Loop Interchange > > > for (j = 2; j < 76; j++) { > > > for (jm = 1; jm < 30; jm++) { > > > h[j-1][jm-1] = j + 83; > > > } > > > } > > > > > > The tests are put in to two categories > > > - tests that have different runtime outputs when compiled at -O0 > > > and > > > -O2 (this category also includes runtime crashes) > > > - tests that cause infinite loops in the Clang optimizer > > > > > > Many of these failing tests could be due to the same bug, thus a > > > much > > > smaller number of root problems are expected. > > > > > > Any help with triaging these bugs would be highly appreciated. > > > > I've gone through all of the miscompile cases, used bugpoint to > > reduce them, and opened individual PRs for several distinct bugs. > > So > > far we have: PR16455 (loop vectorizer), PR16457 (sccp), PR16460 > > (instcombine). Thanks again for doing this! Do you plan on > > repeating > > this testing on a regular basis? Can it be automated? > > > > -Hal > > > > > > > > Thanks, > > > -moh > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From tanmx_star at yeah.net Fri Jul 26 08:30:20 2013 From: tanmx_star at yeah.net (Star Tan) Date: Fri, 26 Jul 2013 23:30:20 +0800 (CST) Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <51F213DB.10700@grosser.es> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> Message-ID: <5f13ae5d.a1bc.1401b9a2247.Coremail.tanmx_star@yeah.net> At 2013-07-26 14:14:51,"Tobias Grosser" wrote: >On 07/25/2013 09:01 PM, Star Tan wrote: >> Hi Sebastian, >> >> >> Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: >> http://llvm.org/bugs/show_bug.cgi?id=14240 >> >> >> My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. >> >> >> Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated. > >Hi Star Tan, > >thanks for looking into this. Just to sum up, the test case that >shows this slowness is the following: > >void bar(); >void foo(short *input) { > short x0, x1; > char i, ctr; > > for(i = 0; i < 8; i++) { > > // SCoP begin > for (ctr = 0; ctr < 8; ctr++) { > x0 = input[i*64 + ctr*8 + 0] ; > x1 = input[i*64 + ctr*8 + 1] ; > > input[i*64 + ctr*8 + 0] = x0 - x1; > input[i*64 + ctr*8 + 1] = x0 + x1; > input[i*64 + ctr*8 + 2] = x0 * x1; > } > // SCoP end > > bar(); // Unknown function call stops further expansion of SCoP > } >} > >Which is translated to the following scop: > > Context: > [p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 >and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and >p_2 <= 2147483647 } > p_0: {0,+,128}<%for.cond2.preheader> > p_1: {2,+,128}<%for.cond2.preheader> > p_2: {4,+,128}<%for.cond2.preheader> >[...] > >The interesting observation is, that Polly introduces three parameters >(p_0, p_1, p_2) for this SCoP, even though in the C source code only the >variable 'i' is SCoP invariant. However, due to the way >SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant >SCEVExpr(essions) which are all translated into independent parameters. >However, as we can see, the only difference between the three >parameters is a different constant in the base of the AddRecExpr. If we >would just introduce p_0 (the parameter where the scev-base is zero) and >express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve >this problem very quickly. >There are other ways to improve performance which include further tuning >isl or reducing the precision of the analysis, but in this case >I don't think we need to look into them. The above fix should give us >good performance and additionally also increases the precision of the >result (as isl now understands the relation between the different >parameters). > >To fix this, you probably would like to look into the SCEVAffinator >class and the way parameters are handled. > Thank you so much. Maybe I was in the wrong direction at first -:) Yes, you are right. I should first investigate why Polly introduces three parameters rather than just one parameter. Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Fri Jul 26 06:38:49 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Fri, 26 Jul 2013 08:38:49 -0500 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <51ECDB9B.6080909@free.fr> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECDB9B.6080909@free.fr> Message-ID: <51F27BE9.6000008@codeaurora.org> On 7/22/2013 2:13 AM, Duncan Sands wrote: > > For example, you could declare every function to return > two results, the usual one and an additional i1 result. If the i1 value > returned is 1 then this means that "an exception is being unwound", and the > caller should jump to the landing pad if there is one; and if there > isn't then > it itself should return 1. The major problem with this is that you would introduce explicit instructions that would not correspond to any executable code. In practice it may be nearly impossible to eliminate them. If all functions that "may-unwind" were only called via invoke, the landing pad could simply do "resume" to pass control up the call stack. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From rafael.espindola at gmail.com Fri Jul 26 06:10:51 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Fri, 26 Jul 2013 09:10:51 -0400 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <51F194AB.10909@codeaurora.org> Message-ID: On 26 July 2013 08:39, Rafael Espíndola wrote: > On 25 July 2013 17:24, Rui Ueyama wrote: >> Then how about enable these flags for -O2? I want to hear from other people >> cc'ed, and I may be too cautious, but I'd hesitate to define a new ELF >> section if there's other mean already available to achieve the same thing. Some numbers with -ffunction-section and -fdata-section when building clang: no-gc: clang is 50 152 128 bytes clang link time is: 0m0.623s .o files are 109 061 386 bytes gc clang is 49 824 592 bytes clang link time is: 0m1.369s .o files are 137 607 146 bytes So there is a noticeable overhead in .o size and link time. Maybe we should start by enabling these options at -O3? Cheers, Rafael From kparzysz at codeaurora.org Fri Jul 26 06:01:46 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Fri, 26 Jul 2013 08:01:46 -0500 Subject: [LLVMdev] -Os In-Reply-To: <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> Message-ID: <51F2733A.3020300@codeaurora.org> On 7/23/2013 1:36 PM, Jim Grosbach wrote: > > This isn’t just a nitpick. This is exactly why you’re seeing > differences. The pass managers aren’t always set up the same, for example. > > FWIW, I feel your pain. This is a long-standing weakness of our > infrastructure. What was the motivation for this design? Was it to save time by not creating another process, or were there other factors? The IBM compiler for example, has all of its components in different executables. It makes debugging a lot more convenient (especially when working with large applications built with IPA). -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From rafael.espindola at gmail.com Fri Jul 26 05:39:09 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Fri, 26 Jul 2013 08:39:09 -0400 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <51F194AB.10909@codeaurora.org> Message-ID: On 25 July 2013 17:24, Rui Ueyama wrote: > Then how about enable these flags for -O2? I want to hear from other people > cc'ed, and I may be too cautious, but I'd hesitate to define a new ELF > section if there's other mean already available to achieve the same thing. I would probably support doing that first. A small annoyance is that the linker requires the --gc-sections option, but most current gnu (bfd and gold) versions support that, so we should be fine at least on linux (and the driver already collects the distro we are in anyway in case we need to change the default for some old distro). Once that is in, the existing proposals for splitting sections into atoms become speed and relocatable object size optimizations. Cheers, Rafael From brianherman at gmail.com Fri Jul 26 04:02:52 2013 From: brianherman at gmail.com (Brian Herman) Date: Fri, 26 Jul 2013 07:02:52 -0400 Subject: [LLVMdev] Upgrading the visual studio from 2008 to 2010 Message-ID: Anyone have any tips on upgrading the llvm visual studio project from 2008 to 2010? -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From conormacaoidh at gmail.com Fri Jul 26 03:27:46 2013 From: conormacaoidh at gmail.com (Conor Mac Aoidh) Date: Fri, 26 Jul 2013 11:27:46 +0100 Subject: [LLVMdev] Vector DAG Patterns In-Reply-To: <51EE630B.1030002@gmail.com> References: <51EE630B.1030002@gmail.com> Message-ID: <51F24F22.8080800@gmail.com> To elaborate, it is not only cumbersome writing these patterns for vectors of 16 characters (v16i8), it does not work. When I compile with this pattern for an andx operation on v16i8: /[(set RC:$dst,// // (and (i8 (vector_extract(vt VC:$src), 0 ) ), // // (and (i8 (vector_extract(vt VC:$src), 1 ) ),// // (and (i8 (vector_extract(vt VC:$src), 2 ) ),// ////(and (i8 (vector_extract(vt VC:$src), 3 ) ),// // (and (i8 (vector_extract(vt VC:$src), 4 ) ),// // (and (i8 (vector_extract(vt VC:$src), 5 ) ),// // (and (i8 (vector_extract(vt VC:$src), 6 ) ),// // (and (i8 (vector_extract(vt VC:$src), 7 ) ),// // (and (i8 (vector_extract(vt VC:$src), 8 ) ), // // (and (i8 (vector_extract(vt VC:$src), 9 ) ),// // (and (i8 (vector_extract(vt VC:$src), 10 ) ),// // (and (i8 (vector_extract(vt VC:$src), 11 ) ),// // (and (i8 (vector_extract(vt VC:$src), 12 ) ),// // (and (i8 (vector_extract(vt VC:$src), 13 ) ),// // (and (i8 (vector_extract(vt VC:$src), 14 ) ),// // (i8 (vector_extract(vt VC:$src), 15 ) )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// // )// //)]/ llvm-tblgen enters an infinite loop which never stops (i left it for ~10 mins before killing) So either there is another way to express this pattern, or this is a problem with tablegen? Regards --- Conor Mac Aoidh On 23/07/2013 12:03, Conor Mac Aoidh wrote: > Hi All, > > Been having a problem constructing a suitable pattern to represent > some vector operations in the DAG. Stuff like andx/orx operations > where elements of a vector are anded/ored together. > > My approach thus far has been to extract the sub elements of the > vector and and/or those elements. This is ok for 4 vectors of i32s, > but becomes cumbersome for v16i8s. Example instruction: > > andx $dst $v1 > > Pattern: > > [(set RC:$dst, > (and (i32 (vector_extract(vt VC:$src), 0 ) ), > (and (i32 (vector_extract(vt VC:$src), 1 ) ), > (and (i32 (vector_extract(vt VC:$src), 2 ) ), > (i32 (vector_extract(vt VC:$src), 3 ) ) > ) > ) > ) > )] > > Is there a better way to do this? > > Regards > --- > Conor Mac Aoidh -------------- next part -------------- An HTML attachment was scrubbed... URL: From noloader at gmail.com Fri Jul 26 01:58:46 2013 From: noloader at gmail.com (Jeffrey Walton) Date: Fri, 26 Jul 2013 04:58:46 -0400 Subject: [LLVMdev] Botan and Android Message-ID: Hi Jack, I'm almost there with Android..... I've actually got the static and dynamic libraries built. I'm choking on the test suite. Do you want to take a shot at working around Android and [embedded, lame] STLport? I would try removing the call to rend() in std::map<_Key, _Tp, _Compare, _Alloc>::rend(). But I really should not modify program code. Fiddling with a Makefile is one thing, but modifying source code is another. Jeff arm-linux-androideabi-g++ --sysroot=/opt/android-ndk-r8e//platforms/android-14/arch-arm -I/opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/ -Ibuild/include -O2 -D_REENTRANT -Wno-long-long -fpermissive -W -Wall -c checks/bench.cpp -o build/checks/bench.o checks/bench.cpp: In function 'void (anonymous namespace)::report_results(const string&, const std::map, std::allocator >, double>&)': checks/bench.cpp:157:102: error: no match for 'operator!=' in 'i != std::map<_Key, _Tp, _Compare, _Alloc>::rend() [with _Key = double, _Tp = std::basic_string, std::allocator >, _Compare = std::less, _Alloc = std::allocator, std::allocator > > >, std::map<_Key, _Tp, _Compare, _Alloc>::reverse_iterator = std::reverse_iterator, std::allocator > >, std::priv::_MapTraitsT, std::allocator > > > > >]()' checks/bench.cpp:157:102: note: candidates are: /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_relops_cont.h:21:1: note: template bool std::operator!=(const std::deque<_Tp, _Alloc>&, const std::deque<_Tp, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_relops_cont.h:21:1: note: template bool std::operator!=(const std::multimap<_Key, _Tp, _Compare, _Alloc>&, const std::multimap<_Key, _Tp, _Compare, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_relops_cont.h:21:1: note: template bool std::operator!=(const std::map<_Key, _Tp, _Compare, _Alloc>&, const std::map<_Key, _Tp, _Compare, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_relops_cont.h:21:1: note: template bool std::operator!=(const std::priv::_Rb_tree<_Key, _Compare, _Value, _KeyOfValue, _Traits, _Alloc>&, const std::priv::_Rb_tree<_Key, _Compare, _Value, _KeyOfValue, _Traits, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_relops_cont.h:21:1: note: template bool std::operator!=(const std::vector<_Tp, _Alloc>&, const std::vector<_Tp, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_istreambuf_iterator.h:118:24: note: template bool std::operator!=(const std::istreambuf_iterator<_CharT, _Traits>&, const std::istreambuf_iterator<_CharT, _Traits>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_string_operators.h:473:1: note: template bool std::operator!=(const std::basic_string<_CharT, _Traits, _Alloc>&, const _CharT*) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_string_operators.h:465:1: note: template bool std::operator!=(const _CharT*, const std::basic_string<_CharT, _Traits, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_string_operators.h:425:1: note: template bool std::operator!=(const std::basic_string<_CharT, _Traits, _Alloc>&, const std::basic_string<_CharT, _Traits, _Alloc>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_alloc.h:384:24: note: template bool std::operator!=(const std::allocator<_T1>&, const std::allocator<_T2>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_iterator.h:124:24: note: template bool std::operator!=(const std::reverse_iterator<_Iterator>&, const std::reverse_iterator<_Iterator>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_pair.h:92:24: note: template bool std::operator!=(const std::pair<_T1, _T2>&, const std::pair<_T1, _T2>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_deque.h:247:1: note: template bool std::priv::operator!=(const std::priv::_Deque_iterator_base<_Tp>&, const std::priv::_Deque_iterator_base<_Tp>&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_bvector.h:150:25: note: bool std::priv::operator!=(const std::priv::_Bit_iterator_base&, const std::priv::_Bit_iterator_base&) /opt/android-ndk-r8e//sources/cxx-stl/stlport/stlport/stl/_bvector.h:150:25: note: no known conversion for argument 1 from 'std::map, std::allocator > >::const_reverse_iterator {aka std::reverse_iterator, std::allocator > >, std::priv::_ConstMapTraitsT, std::allocator > > > > >}' to 'const std::priv::_Bit_iterator_base&' From g.franceschetti at vidya.it Fri Jul 26 00:05:21 2013 From: g.franceschetti at vidya.it (Giorgio Franceschetti *) Date: Fri, 26 Jul 2013 09:05:21 +0200 Subject: [LLVMdev] Build Clang and LLVM on Win 8 In-Reply-To: References: <51EC3731.1040104@vidya.it> <874nbnk2e8.fsf@wanadoo.es> <87y58zidax.fsf@wanadoo.es> <51ED9B35.4010304@vidya.it> <51EE131F.70008@vidya.it> <87mwpdhrjz.fsf@wanadoo.es> <51F0CADA.9090602@vidya.it> Message-ID: <51F21FB1.2000205@vidya.it> Thanks Greg, I was finally able to compile successfully both on Visual Studio 2012 and Code::Blocks. But, while the compilation done with Visual Studio seems to be complete, with Code::Blocks there are some programs missing (notably clang and clang++)... Anyone can explain me why? Thanks in advance, Giorgio Il 25/07/2013 09.03, Greg_Bedwell at sn.scee.net ha scritto: > Hi Giorgio, > > See here: > http://clang.llvm.org/hacking.html#testingWindows > > grep (and a few other required tools) are in the GnuWin32 tools. > > Thanks, > > Greg Bedwell > SN Systems - Sony Computer Entertainment Group > From tobias at grosser.es Thu Jul 25 23:14:51 2013 From: tobias at grosser.es (Tobias Grosser) Date: Thu, 25 Jul 2013 23:14:51 -0700 Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> Message-ID: <51F213DB.10700@grosser.es> On 07/25/2013 09:01 PM, Star Tan wrote: > Hi Sebastian, > > > Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: > http://llvm.org/bugs/show_bug.cgi?id=14240 > > > My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. > > > Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated. Hi Star Tan, thanks for looking into this. Just to sum up, the test case that shows this slowness is the following: void bar(); void foo(short *input) { short x0, x1; char i, ctr; for(i = 0; i < 8; i++) { // SCoP begin for (ctr = 0; ctr < 8; ctr++) { x0 = input[i*64 + ctr*8 + 0] ; x1 = input[i*64 + ctr*8 + 1] ; input[i*64 + ctr*8 + 0] = x0 - x1; input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; } // SCoP end bar(); // Unknown function call stops further expansion of SCoP } } Which is translated to the following scop: Context: [p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and p_2 <= 2147483647 } p_0: {0,+,128}<%for.cond2.preheader> p_1: {2,+,128}<%for.cond2.preheader> p_2: {4,+,128}<%for.cond2.preheader> [...] The interesting observation is, that Polly introduces three parameters (p_0, p_1, p_2) for this SCoP, even though in the C source code only the variable 'i' is SCoP invariant. However, due to the way SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant SCEVExpr(essions) which are all translated into independent parameters. However, as we can see, the only difference between the three parameters is a different constant in the base of the AddRecExpr. If we would just introduce p_0 (the parameter where the scev-base is zero) and express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve this problem very quickly. There are other ways to improve performance which include further tuning isl or reducing the precision of the analysis, but in this case I don't think we need to look into them. The above fix should give us good performance and additionally also increases the precision of the result (as isl now understands the relation between the different parameters). To fix this, you probably would like to look into the SCEVAffinator class and the way parameters are handled. Cheers, Tobias From tobias at grosser.es Thu Jul 25 21:27:50 2013 From: tobias at grosser.es (Tobias Grosser) Date: Thu, 25 Jul 2013 21:27:50 -0700 Subject: [LLVMdev] Does nounwind have semantics? In-Reply-To: <251BD6D4E6A77E4586B482B33960D2283361BCA1@HASMSX106.ger.corp.intel.com> References: <58281658-BDAE-4D12-859A-859308EE6FF8@apple.com> <51ECB01C.5020605@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283360813B@HASMSX106.ger.corp.intel.com> <51ECDE23.2020004@mxc.ca> <251BD6D4E6A77E4586B482B33960D22833608266@HASMSX106.ger.corp.intel.com> <251BD6D4E6A77E4586B482B33960D2283361BB88@HASMSX106.ger.corp.intel.com> <51F0EB8B.10406@mxc.ca> <251BD6D4E6A77E4586B482B33960D2283361BCA1@HASMSX106.ger.corp.intel.com> Message-ID: <51F1FAC6.3030206@grosser.es> On 07/25/2013 02:14 AM, Kuperstein, Michael M wrote: > Right, will fix the CallInst, copy/pasted from a different case and didn't notice what I was doing, thanks. > > Re LangRef.rst, that's what I meant, I'm still hoping for better suggestions regarding the name... > > As to the conflict - Tobias, feel free to go first, I'll merge. Hi Michael, my patch is in. Feel free to commit your patch on top of it. Also, if you could add a test case into test/Bitcode/attributes.ll, this would be great. Tobi From clattner at apple.com Thu Jul 25 21:07:22 2013 From: clattner at apple.com (Chris Lattner) Date: Thu, 25 Jul 2013 21:07:22 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Message-ID: <28838467-2769-4E2B-BF7B-01DDCADA8F54@apple.com> On Jul 25, 2013, at 6:04 PM, David Blaikie wrote: >>> void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) >> >> The idea is that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum. > > Presumably there are diagnostics that are going to have more than one > parameter, though. ArrayRef? (though this doesn't scale to > providing fancy things like DIVariable parameters, that would possibly > allow Clang to lookup the variable (or function, etc) & print out > cursors, etc) It's possible, but the full generality of any diagnostic can be folded (at some loss of generality) into a single string. To balance complexity and generality vs simplicity, "1" is a pretty decent number. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Thu Jul 25 21:01:13 2013 From: tanmx_star at yeah.net (Star Tan) Date: Fri, 26 Jul 2013 12:01:13 +0800 (CST) Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass Message-ID: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> Hi Sebastian, Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: http://llvm.org/bugs/show_bug.cgi?id=14240 My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated. Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dblaikie at gmail.com Thu Jul 25 18:04:07 2013 From: dblaikie at gmail.com (David Blaikie) Date: Thu, 25 Jul 2013 18:04:07 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Message-ID: On Thu, Jul 25, 2013 at 5:57 PM, Chris Lattner wrote: > > On Jul 25, 2013, at 5:09 PM, Quentin Colombet wrote: > >> Hi, >> >> I think we have a consensus on how we should report diagnostics now. >> For broader uses, the discussion is still open. >> >> To move forward on the diagnostic part, here is the plan: >> - Extend the current handler with a prototype like: >> void report(enum Kind, enum Classification, const char* msg) >> where >> - Kind is the kind of report: InlineAsm, StackSize, Other. >> - Classification is Error, Warning. >> - msg contains the fall back message to print in case the front-end do not know what to do with the report. >> >> How does this sound? > > Sounds like the right direction, how about extending it a bit more to be: > > >> void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) > > The idea is that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum. Presumably there are diagnostics that are going to have more than one parameter, though. ArrayRef? (though this doesn't scale to providing fancy things like DIVariable parameters, that would possibly allow Clang to lookup the variable (or function, etc) & print out cursors, etc) From clattner at apple.com Thu Jul 25 17:57:07 2013 From: clattner at apple.com (Chris Lattner) Date: Thu, 25 Jul 2013 17:57:07 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Message-ID: On Jul 25, 2013, at 5:09 PM, Quentin Colombet wrote: > Hi, > > I think we have a consensus on how we should report diagnostics now. > For broader uses, the discussion is still open. > > To move forward on the diagnostic part, here is the plan: > - Extend the current handler with a prototype like: > void report(enum Kind, enum Classification, const char* msg) > where > - Kind is the kind of report: InlineAsm, StackSize, Other. > - Classification is Error, Warning. > - msg contains the fall back message to print in case the front-end do not know what to do with the report. > > How does this sound? Sounds like the right direction, how about extending it a bit more to be: > void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) The idea is that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum. -Chris From qcolombet at apple.com Thu Jul 25 17:09:53 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Thu, 25 Jul 2013 17:09:53 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> Message-ID: <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Hi, I think we have a consensus on how we should report diagnostics now. For broader uses, the discussion is still open. To move forward on the diagnostic part, here is the plan: - Extend the current handler with a prototype like: void report(enum Kind, enum Classification, const char* msg) where - Kind is the kind of report: InlineAsm, StackSize, Other. - Classification is Error, Warning. - msg contains the fall back message to print in case the front-end do not know what to do with the report. How does this sound? Thanks again for all the contributions made to this thread so far! Cheers, Quentin On Jul 24, 2013, at 10:30 PM, David Blaikie wrote: > On Wed, Jul 24, 2013 at 10:23 PM, Chris Lattner wrote: >> On Jul 24, 2013, at 10:16 PM, David Blaikie wrote: >>>> How about this: keep the jist of the current API, but drop the "warning"- or >>>> "error"-ness of the API. Instead, the backend just includes an enum value >>>> (plus string message for extra data). The frontend makes the decision of >>>> how to render the diagnostic (or not, dropping them is fine) along with how >>>> to map them onto warning/error or whatever concepts they use. >>> >>> I'm not quite clear from your suggestion whether you're suggesting the >>> backend would produce a complete diagnostic string, or just the >>> parameters - requiring/leaving it up to the frontend to have a full >>> textual string for each backend diagnostic with the right number of >>> placeholders, etc. I'm sort of in two minds about that - I like the >>> idea that frontends keep all the user-rendered text (means >>> localization issues are in one place, the frontend - rather than >>> ending up with english text backend diagnostics rendered in a >>> non-english client (yeah, pretty hypothetical, I'm not sure anyone has >>> localized uses of LLVM)). But this does mean that there's no "free >>> ride" - frontends must have some explicit handling of each backend >>> diagnostic (some crappy worst-case fallback, but it won't be a useful >>> message). >> >> I don't have a specific proposal in mind, other than thinking along the exact same lines as you above. :) >> >> The best approach is probably hybrid: the diagnostic producer can produce *both* a full string like today, as well as an "ID + enum" pair. This way, clang can use the later, but llc (as one example of something we want to keep simple) could print the former, and frontends that get unknown enums could fall back on the full string. Agreed. That is exactly what I tried to express. > > Fair-ish. If it were just for the sake of llc it'd be hard to justify > having the strings in LLVM rather than just in llc itself, but > providing them as fallbacks is probably reasonable/convenient & not > likely to be a technical burden. Hopefully if we wanted that we'd > still put something in Clang to maintain the frontend diagnostic line > rather than letting it slip. > >> >>> & I don't think this avoids the desire to have non-diagnostic >>> callbacks whenever possible (notify of interesting things, frontends >>> can decide whether to use that information to emit a diagnostic based >>> on some criteria or behave differently in another way). >> >> Sure, but we also don't want to block progress in some area because we have a desire to solve a bigger problem. > > Sure enough - I think the only reason to pre-empt the bigger problem > is to ensure that the immediate progress doesn't lead to bad > implementations of those bigger issues being committed due to > convenience. Ensuring that the solution we implement now makes it hard > to justify (not hard to /do/ badly, just hard to justify doing it > badly by ensuring that the right solution is convenient/easy) taking > shortcuts later would be good. > > That might just be the difference between having a function pointer > callback for the diagnostic case and instead having a callback type > with the diagnostic callback as the first one, with the intent to add > more in cases where backend-diagnostics aren't the right tool. That > way we have an callback interface we can easily extend. (ideally I'd > love to have two things in the callback interface early, as an example > - but that's not necessary & probably won't happen) > > I don't know enough about the particular things Quentin's planning to > implement to know whether any of them fall into the "probably > shouldn't be an LLVM diagnostic" bag. For now, I just want to provide a way for the back-end to express diagnostic. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From micah.villmow at smachines.com Thu Jul 25 16:42:11 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Thu, 25 Jul 2013 23:42:11 +0000 Subject: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? In-Reply-To: References: <2D2A1070-5DA8-4A4F-8E32-2557E8AE486C@gmail.com> Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE60070888B9@smi-exchange1.smi.local> It should just be enough to specify the data layout string. Micah From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Justin Holewinski Sent: Thursday, July 25, 2013 12:59 PM To: Matt Arsenault Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? Awesome! What will the requirements be for the target? Is it sufficient to just override getPointerTy and add appropriate data layout strings, or will more hooks be needed? On Thu, Jul 25, 2013 at 3:41 PM, Matt Arsenault > wrote: On Jul 25, 2013, at 12:33 , Justin Holewinski > wrote: > Looking through recent additions, it looks like the infrastructure exists for targets to specify a per-address-space pointer size/type, but this does not seem to be used anywhere (SelectionDAGBuilder, legalizer, etc.). What is the status of this support? Is anyone actively working on it? > I'm actively working on this. I think I have most of the target independent parts ready that I'm gradually sending through review. I haven't looked at what needs to be done in SelectionDAG yet. -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: From fang at csl.cornell.edu Thu Jul 25 14:54:48 2013 From: fang at csl.cornell.edu (David Fang) Date: Thu, 25 Jul 2013 17:54:48 -0400 (EDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source Message-ID: Hi all, My recent commit r187027 fixed a simple oversight of forgetting to check for __ppc__ (only checking __powerpc__), which broke my powerpc-apple-darwin8 stage1 tests, since the system gcc only provided __ppc__. I was wondering if this justifies using simpler macros like #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) ...) I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in conditionals. These proposed standardized macros would only be used in LLVM project sources; there's no reason to exported them. The standardized macros would simplify conditionals and make their use less error-prone. What predefines do other architectures use? What would be a suitable place for these proposed macros? include/llvm/Support/Compiler.h? include/llvm/Support/Arch.h (new)? Fang -- David Fang http://www.csl.cornell.edu/~fang/ From arsenm2 at gmail.com Thu Jul 25 15:01:34 2013 From: arsenm2 at gmail.com (Matt Arsenault) Date: Thu, 25 Jul 2013 15:01:34 -0700 Subject: [LLVMdev] Status of getPointerSize()/getPointerTy() per address space? In-Reply-To: References: <2D2A1070-5DA8-4A4F-8E32-2557E8AE486C@gmail.com> Message-ID: <0EE3FAED-E1C0-4E2B-8B8C-E01D9074FEE9@gmail.com> On Jul 25, 2013, at 12:58 , Justin Holewinski wrote: > Awesome! What will the requirements be for the target? Is it sufficient to just override getPointerTy and add appropriate data layout strings, or will more hooks be needed? > I think that should be it From rkotler at mips.com Thu Jul 25 14:45:07 2013 From: rkotler at mips.com (Reed Kotler) Date: Thu, 25 Jul 2013 14:45:07 -0700 Subject: [LLVMdev] [LNT][Patch] Bug 16261 - lnt incorrectly builds timeit-target when one is using a simulator In-Reply-To: <51EF1D26.6080402@mips.com> References: <51EDE09D.1000106@mips.com> <51EF1D26.6080402@mips.com> Message-ID: <51F19C63.5020908@mips.com> Okay to push this change? On 07/23/2013 05:17 PM, reed kotler wrote: > Hi Daniel, > > In this case we are not using lnt under Qemu user mode for benchmarking; > just as a way to run test-suite to test whether the code is correct. > > Qemu user mode emulates target instructions, but when it gets a Unix > Kernel trap, it uses the host to emulate those. > > For example, file I/O. > > It is possible to run target timeit under qemu and let it launch the app > or a wrapper. > (But it is more limited as to what can be done here under qemu vs under > the host OS directly). > > For time functions, it is also going to use the host to emulate those. > > So whether timeit is running under qemu or directly on the host, the > answers regarding time will be the same. > > But running timeit under qemu will be much slower as far as elapsed time > than running it on the host directly. > > We would also need to add some new mechanism to Lnt or the makefiles to > also wrap timeit. > > Reed > > > On 07/23/2013 02:19 PM, Daniel Dunbar wrote: >> Wouldn't it be a more accurate simulation to run timeit-target under >> the emulator as well? Or is that too much to ask? >> >> - Daniel >> >> >> On Mon, Jul 22, 2013 at 6:47 PM, Reed Kotler >> > > wrote: >> >> >> >> Just to clarify: >> >> this is when tests are run under USER mode qemu. >> >> >> On 07/22/2013 04:09 PM, Doug Gilmore wrote: >> >> I attached a patch to lnt that addresses this issue. >> >> The patch adds the --host-compile-tools option, which when >> specified, >> forces compilation of the tools for execution on the host. >> >> This allows lnt to be used for correctness testing when the tests >> are run under QEMU. >> >> Comments? >> >> Doug >> >> >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits-Tmj1lob9twqVc3sceRu5cw-XMD5yJDbdMReXY1tMh2IBg at public.gmane.org >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >> >> >> >> >> _______________________________________________ >> llvm-commits mailing list >> llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >> >> > > > > _______________________________________________ > llvm-commits mailing list > llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > From shankare at codeaurora.org Thu Jul 25 14:12:11 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Thu, 25 Jul 2013 16:12:11 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> Message-ID: <51F194AB.10909@codeaurora.org> Not all users compile their code with -ffunction-sections and -fdata-sections. This is to handle usecases when libraries use a mix of object files too. On 7/25/2013 4:10 PM, Rui Ueyama wrote: > Is there any reason -ffunction-sections and -fdata-sections wouldn't work? > If it'll work, it may be be better to say "if you want to get a better > linker output use these options", rather than defining new ELF section. > > > On Thu, Jul 25, 2013 at 2:01 PM, Shankar Easwaran > wrote: > >> On 7/25/2013 3:56 PM, Rui Ueyama wrote: >> >>> I think I share the goal with you to make the foundation for better >>> dead-strip, so thank you for suggesting. I'm not sure if marking a section >>> as a whole as "safe" or "unsafe" is the best approach, though. Some >>> comments. >>> >>> - If the compiler generated code is always "safe", and if we can >>> distinguish it from hand-written assembly code by checking if there's a >>> gap >>> between symbols, can we just assume a section with no gap is always >>> "safe"? >>> >> Gaps could just be caused due to alignment, but the code may be safe, >> which the compiler knows very well. >> >> >> - "Safeness" is not an attribute of the section but of the symbol, I >>> think. The symbol is "safe" if there's no direct reference to the symbol >>> data. All references should go through relocations. A section may contain >>> both "safe" and "unsafe" symbols. >>> >> Sections contain symbols. In the context of ELF, marking sections as >> safe/not is more desirable because of the switches (-ffunction-sections and >> -fdata-sections available already). >> >> >> - How about making the compiler to create a new section for each "safe" >>> atom, as it does for inline functions? >>> >> You already have a switch called -ffunction-sections and -fdata-sections >> to put function and data in seperate sections. >> >> >>> On Thu, Jul 25, 2013 at 10:54 AM, Shankar Easwaran >>> **wrote: >>> >>> Hi, >>>> Currently lld ties up all atoms in a section for ELF together. This >>>> proposal just breaks it by handling it differently. >>>> >>>> *This requires **NO ELF ABI changes. >>>> >>>> **Definitions :-* >>>> >>>> >>>> A section is not considered safe if there is some code that appears to be >>>> present between function boundaries (or) optimizes sections to place data >>>> at the end or beginning of a section (that contains no symbol). >>>> >>>> A section is considered safe if symbols contained within the section have >>>> been associated with their appropriate sizes and there is no data present >>>> between function boundaries. >>>> >>>> Examples of safe sections are, code generated by compilers. >>>> >>>> Examples of unsafe sections are, hand written assembly code. >>>> >>>> *Changes Needed :-* >>>> >>>> >>>> The change that I am trying to propose is the compiler emits a section, >>>> called (*.safe_sections) *that contains section indices on what sections >>>> >>>> are safe. >>>> >>>> The section would have a SHF_EXCLUDE flag, to prevent other linkers from >>>> consuming this section and making it to the output file. >>>> >>>> Data structure for this :- >>>> >>>> .safe_sections >>>> >>>>
>>>> ... >>>> ... >>>> >>>> >>>> *Advantages >>>> *There are advantages that the atoms within a safe section could just be >>>> >>>> allocated in the output file which means better output file layout, and >>>> Better performance! >>>> >>>> This would also result in more atoms getting gc'ed. >>>> >>>> a) looking at profile information >>>> b) taking a order file >>>> >>>> *Changes needed in the assembler >>>> >>>> *a) add an additional flag in the section for people writing assembly >>>> >>>> code, to mark a section safe or unsafe. >>>> * >>>> **Changes needed in lld >>>> >>>> *a) Read the safe section if its present in the object file >>>> >>>> b) Tie atoms together within a section if the section is not safe >>>> * >>>> *Thanks >>>> >>>> Shankar Easwaran* >>>> >>>> * >>>> >>>> -- >>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>>> hosted by the Linux Foundation >>>> >>>> >>>> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted >> by the Linux Foundation >> >> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From rnk at google.com Thu Jul 25 14:38:45 2013 From: rnk at google.com (Reid Kleckner) Date: Thu, 25 Jul 2013 14:38:45 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI Message-ID: Hi LLVM folks, To properly implement pass-by-value in the Microsoft C++ ABI, we need to be able to take the address of an outgoing call argument slot. This is http://llvm.org/PR5064 . Problem ------- On Windows, C structs are pushed right onto the stack in line with the other arguments. In LLVM, we use byval to model this, and it works for C structs. However, C++ records are also passed this way, and reusing byval for C++ records breaks C++ object identity rules. In order to implement the ABI properly, we need a way to get the address of the argument slot *before* we start the call, so that we can either construct the object in place on the stack or at least call its copy constructor. This is further complicated by the possibility of nested calls passing arguments by value. A good general case to think about is a binary tree of calls that take two arguments by value and return by value: struct A { int a; }; A foo(A, A); foo(foo(A(), A()), foo(A(), A())); To complete the outer call to foo, we have to adjust the stack for its outgoing arguments before the inner calls to foo, and arrange for the sret pointers to point to those slots. To make this even more complicated, C++ methods are typically callee cleanup (thiscall), but free functions are caller cleanup (cdecl). Features -------- A few weeks ago, I sat down with some folks at Google and we came up with this proposal, which tries to add the minimum set of LLVM IL features to make this possible. 1. Allow alloca instructions to use llvm.stacksave values to indicate scoping. This creates an SSA dependence between the alloca instruction and the stackrestore instruction that prevents optimizers from accidentally reordering them in ways that don't verify. llvm.stacksave in this case is taking on a role similar to CALLSEQ_START in the selection dag. LLVM can also apply this to dynamic allocas from inline functions to ensure that optimizers don't move them. 2. Add an 'alloca' attribute for parameters. Only an alloca value can be passed to a parameter with this attribute. It cannot be bitcasted or GEPed. An alloca can only be passed in this way once. It can be passed as a normal pointer to any number of other functions. Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, there can be no allocas between the creation of an alloca passed with this attribute and its associated call. 3. Add a stackrestore field to call and invoke instructions. This models calling conventions which do their own cleanup, and ensures that even after optimizations have perturbed the IR, we don't consider the allocas to be live. For caller cleanup conventions, while the callee may have called destructors on its arguments, the allocas can be considered live until the stack restore. Example ------- A single call to foo, assuming it is stdcall, would be lowered something like: %res = alloca %struct.A %base = llvm.stacksave() %arg1 = alloca %struct.A, stackbase %base %arg2 = alloca %struct.A, stackbase %base call @A_ctor(%arg1) call @A_ctor(%arg2) call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore %base If control does not flow through a call or invoke with a stackrestore field, then manual calls to llvm.stackrestore must be emitted before another call or invoke can use an 'alloca' argument. The manual stack restore call ends the lifetime of the allocas. This is necessary to handle unwind edges from argument expression evaluation as well as the case where foo is not callee cleanup. Implementation -------------- By starting out with the stack save and restore intrinsics, we can hopefully approach a slow but working implementation sooner rather than later. The work should mostly be in the verifier, the IR, its parser, and the x86 backend. I don't plan to start working on this immediately, but over the long run this will be really important to support well. --- That's all! Please send feedback! This is admittedly a really complicated feature and I'm sorry for inflicting it on the LLVM community, but it's obviously beyond my control. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruiu at google.com Thu Jul 25 14:24:56 2013 From: ruiu at google.com (Rui Ueyama) Date: Thu, 25 Jul 2013 14:24:56 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F194AB.10909@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <51F194AB.10909@codeaurora.org> Message-ID: Then how about enable these flags for -O2? I want to hear from other people cc'ed, and I may be too cautious, but I'd hesitate to define a new ELF section if there's other mean already available to achieve the same thing. On Thu, Jul 25, 2013 at 2:12 PM, Shankar Easwaran wrote: > Not all users compile their code with -ffunction-sections and > -fdata-sections. > > This is to handle usecases when libraries use a mix of object files too. > > > > On 7/25/2013 4:10 PM, Rui Ueyama wrote: > >> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? >> If it'll work, it may be be better to say "if you want to get a better >> linker output use these options", rather than defining new ELF section. >> >> >> On Thu, Jul 25, 2013 at 2:01 PM, Shankar Easwaran >> **wrote: >> >> On 7/25/2013 3:56 PM, Rui Ueyama wrote: >>> >>> I think I share the goal with you to make the foundation for better >>>> dead-strip, so thank you for suggesting. I'm not sure if marking a >>>> section >>>> as a whole as "safe" or "unsafe" is the best approach, though. Some >>>> comments. >>>> >>>> - If the compiler generated code is always "safe", and if we can >>>> distinguish it from hand-written assembly code by checking if there's a >>>> gap >>>> between symbols, can we just assume a section with no gap is always >>>> "safe"? >>>> >>>> Gaps could just be caused due to alignment, but the code may be safe, >>> which the compiler knows very well. >>> >>> >>> - "Safeness" is not an attribute of the section but of the symbol, I >>> >>>> think. The symbol is "safe" if there's no direct reference to the symbol >>>> data. All references should go through relocations. A section may >>>> contain >>>> both "safe" and "unsafe" symbols. >>>> >>>> Sections contain symbols. In the context of ELF, marking sections as >>> safe/not is more desirable because of the switches (-ffunction-sections >>> and >>> -fdata-sections available already). >>> >>> >>> - How about making the compiler to create a new section for each >>> "safe" >>> >>>> atom, as it does for inline functions? >>>> >>>> You already have a switch called -ffunction-sections and >>> -fdata-sections >>> to put function and data in seperate sections. >>> >>> >>> On Thu, Jul 25, 2013 at 10:54 AM, Shankar Easwaran >>>> ****wrote: >>>> >>>> >>>> Hi, >>>> >>>>> Currently lld ties up all atoms in a section for ELF together. This >>>>> proposal just breaks it by handling it differently. >>>>> >>>>> *This requires **NO ELF ABI changes. >>>>> >>>>> **Definitions :-* >>>>> >>>>> >>>>> A section is not considered safe if there is some code that appears to >>>>> be >>>>> present between function boundaries (or) optimizes sections to place >>>>> data >>>>> at the end or beginning of a section (that contains no symbol). >>>>> >>>>> A section is considered safe if symbols contained within the section >>>>> have >>>>> been associated with their appropriate sizes and there is no data >>>>> present >>>>> between function boundaries. >>>>> >>>>> Examples of safe sections are, code generated by compilers. >>>>> >>>>> Examples of unsafe sections are, hand written assembly code. >>>>> >>>>> *Changes Needed :-* >>>>> >>>>> >>>>> The change that I am trying to propose is the compiler emits a section, >>>>> called (*.safe_sections) *that contains section indices on what >>>>> sections >>>>> >>>>> are safe. >>>>> >>>>> The section would have a SHF_EXCLUDE flag, to prevent other linkers >>>>> from >>>>> consuming this section and making it to the output file. >>>>> >>>>> Data structure for this :- >>>>> >>>>> .safe_sections >>>>> >>>>>
>>>>> ... >>>>> ... >>>>> >>>>> >>>>> *Advantages >>>>> *There are advantages that the atoms within a safe section could just >>>>> be >>>>> >>>>> allocated in the output file which means better output file layout, and >>>>> Better performance! >>>>> >>>>> This would also result in more atoms getting gc'ed. >>>>> >>>>> a) looking at profile information >>>>> b) taking a order file >>>>> >>>>> *Changes needed in the assembler >>>>> >>>>> *a) add an additional flag in the section for people writing assembly >>>>> >>>>> code, to mark a section safe or unsafe. >>>>> * >>>>> **Changes needed in lld >>>>> >>>>> *a) Read the safe section if its present in the object file >>>>> >>>>> b) Tie atoms together within a section if the section is not safe >>>>> * >>>>> *Thanks >>>>> >>>>> Shankar Easwaran* >>>>> >>>>> * >>>>> >>>>> -- >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>>>> hosted by the Linux Foundation >>>>> >>>>> >>>>> >>>>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted >>> by the Linux Foundation >>> >>> >>> > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.espindola at gmail.com Fri Jul 26 10:26:44 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Fri, 26 Jul 2013 13:26:44 -0400 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: > Ok, here is a strong LGTM. =] > > Please make the change, and do the following things to aid out-of-tree > maintainers: > > 1) Add a flag to lit and an option to configure/make (I don't care about > CMake here as that is much less frequently used for out-of-tree work) to > disable pipefail. > I have just fixed the last failures on windows. I have also added documentation and the support for disabling pipefail in a directory. The one thing I have not implemented yet is the configure change. The reason is that after thinking a bit about it it looks like something we don't want to have. What we want to provide is an easy way for people doing out of tree work to get their tests passing after an upgrade. We do want upstream tests to fail for them if they, for example, break opt so that it crashes on exit. This is exactly what the lit.local.cfg provides. A new patch is attached. What do you think? > Also, please send a note (in a new thread) to llvmdev when the switch is > flipped with a reminder about how to disable the new behavior for folks that > can't update their test suite. You'll probably want to flip the switch when > you have time to track down lots of build bot failures. =D OK. > Thanks for making the test infrastructure better, > -Chandler Cheers, Rafael -------------- next part -------------- A non-text attachment was scrubbed... Name: t.patch Type: application/octet-stream Size: 3650 bytes Desc: not available URL: From aaron at aaronballman.com Fri Jul 26 10:34:38 2013 From: aaron at aaronballman.com (Aaron Ballman) Date: Fri, 26 Jul 2013 13:34:38 -0400 Subject: [LLVMdev] Upgrading the visual studio from 2008 to 2010 In-Reply-To: References: Message-ID: CMake should take care of that for you automatically. You should just use the proper generator ("Visual Studio 10" if memory serves me) and the correct solution will be made. HTH! ~Aaron On Fri, Jul 26, 2013 at 7:02 AM, Brian Herman wrote: > Anyone have any tips on upgrading the llvm visual studio project from 2008 > to 2010? > > -- > > > Thanks, > Brian Herman > college.nfshost.com > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From micah.villmow at smachines.com Fri Jul 26 10:40:17 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Fri, 26 Jul 2013 17:40:17 +0000 Subject: [LLVMdev] Vector DAG Patterns In-Reply-To: <51F24F22.8080800@gmail.com> References: <51EE630B.1030002@gmail.com> <51F24F22.8080800@gmail.com> Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE6007088CD9@smi-exchange1.smi.local> Does your target support v16i8 natively? If not, then the DAG framework will split it up into the largest type your target supports. Why are you attempting to do this in tablegen? This is something that should be trivially done in C++ code. For example: SDValue dst = DAG.getNode() <-- extract element 0 for (1 to vec size-1) { SDValue tmp = DAG.getNode() <-- extract element N dst = DAG.getNode(ISD::AND, DL, MVT::i8, dst, tmp); } return dst Simple 5-6 line helper function does what looks to be a very complex and error prone tablegen pattern. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Conor Mac Aoidh Sent: Friday, July 26, 2013 3:28 AM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vector DAG Patterns To elaborate, it is not only cumbersome writing these patterns for vectors of 16 characters (v16i8), it does not work. When I compile with this pattern for an andx operation on v16i8: [(set RC:$dst, (and (i8 (vector_extract(vt VC:$src), 0 ) ), (and (i8 (vector_extract(vt VC:$src), 1 ) ), (and (i8 (vector_extract(vt VC:$src), 2 ) ), (and (i8 (vector_extract(vt VC:$src), 3 ) ), (and (i8 (vector_extract(vt VC:$src), 4 ) ), (and (i8 (vector_extract(vt VC:$src), 5 ) ), (and (i8 (vector_extract(vt VC:$src), 6 ) ), (and (i8 (vector_extract(vt VC:$src), 7 ) ), (and (i8 (vector_extract(vt VC:$src), 8 ) ), (and (i8 (vector_extract(vt VC:$src), 9 ) ), (and (i8 (vector_extract(vt VC:$src), 10 ) ), (and (i8 (vector_extract(vt VC:$src), 11 ) ), (and (i8 (vector_extract(vt VC:$src), 12 ) ), (and (i8 (vector_extract(vt VC:$src), 13 ) ), (and (i8 (vector_extract(vt VC:$src), 14 ) ), (i8 (vector_extract(vt VC:$src), 15 ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )] llvm-tblgen enters an infinite loop which never stops (i left it for ~10 mins before killing) So either there is another way to express this pattern, or this is a problem with tablegen? Regards --- Conor Mac Aoidh On 23/07/2013 12:03, Conor Mac Aoidh wrote: Hi All, Been having a problem constructing a suitable pattern to represent some vector operations in the DAG. Stuff like andx/orx operations where elements of a vector are anded/ored together. My approach thus far has been to extract the sub elements of the vector and and/or those elements. This is ok for 4 vectors of i32s, but becomes cumbersome for v16i8s. Example instruction: andx $dst $v1 Pattern: [(set RC:$dst, (and (i32 (vector_extract(vt VC:$src), 0 ) ), (and (i32 (vector_extract(vt VC:$src), 1 ) ), (and (i32 (vector_extract(vt VC:$src), 2 ) ), (i32 (vector_extract(vt VC:$src), 3 ) ) ) ) ) )] Is there a better way to do this? Regards --- Conor Mac Aoidh -------------- next part -------------- An HTML attachment was scrubbed... URL: From shankare at codeaurora.org Fri Jul 26 10:43:50 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Fri, 26 Jul 2013 12:43:50 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <51F194AB.10909@codeaurora.org> Message-ID: <51F2B556.3020109@codeaurora.org> On 7/26/2013 7:39 AM, Rafael Espíndola wrote: > On 25 July 2013 17:24, Rui Ueyama wrote: >> Then how about enable these flags for -O2? I want to hear from other people >> cc'ed, and I may be too cautious, but I'd hesitate to define a new ELF >> section if there's other mean already available to achieve the same thing. > I would probably support doing that first. A small annoyance is that > the linker requires the --gc-sections option, but most current gnu > (bfd and gold) versions support that, so we should be fine at least on > linux (and the driver already collects the distro we are in anyway in > case we need to change the default for some old distro). > > Once that is in, the existing proposals for splitting sections into > atoms become speed and relocatable object size optimizations. I partly agree. Implementing safe sections would be beneficial if you are getting third party libraries or system libraries(which are not usually compiled with -ffunction-sections and -fdata-sections). It would be nice to have -ffunction-sections and -fdata-sections the default at -O2. I am not sure why it was not made the default for all these years though. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From cameron.mcinally at nyu.edu Fri Jul 26 10:46:29 2013 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Fri, 26 Jul 2013 13:46:29 -0400 Subject: [LLVMdev] Vector DAG Patterns In-Reply-To: <51F24F22.8080800@gmail.com> References: <51EE630B.1030002@gmail.com> <51F24F22.8080800@gmail.com> Message-ID: Hey Conor, On Fri, Jul 26, 2013 at 6:27 AM, Conor Mac Aoidh wrote: > To elaborate, it is not only cumbersome writing these patterns for vectors > of 16 characters (v16i8), it does not work. > > When I compile with this pattern for an andx operation on v16i8: > > [(set RC:$dst, > (and (i8 (vector_extract(vt VC:$src), 0 ) ), > (and (i8 (vector_extract(vt VC:$src), 1 ) ), > (and (i8 (vector_extract(vt VC:$src), 2 ) ), > (and (i8 (vector_extract(vt VC:$src), 3 ) ), > (and (i8 (vector_extract(vt VC:$src), 4 ) ), > (and (i8 (vector_extract(vt VC:$src), 5 ) ), > (and (i8 (vector_extract(vt VC:$src), 6 ) ), > (and (i8 (vector_extract(vt VC:$src), 7 ) ), > (and (i8 (vector_extract(vt VC:$src), 8 ) ), > (and (i8 (vector_extract(vt VC:$src), 9 ) ), > (and (i8 (vector_extract(vt VC:$src), 10 ) ), > (and (i8 (vector_extract(vt VC:$src), 11 ) > ), > (and (i8 (vector_extract(vt VC:$src), 12 ) > ), > (and (i8 (vector_extract(vt VC:$src), 13 > ) ), > (and (i8 (vector_extract(vt VC:$src), > 14 ) ), > (i8 (vector_extract(vt VC:$src), 15 > ) ) > > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > ) > )] > > llvm-tblgen enters an infinite loop which never stops (i left it for ~10 > mins before killing) > > So either there is another way to express this pattern, or this is a problem > with tablegen? > > > Regards > --- > Conor Mac Aoidh > > On 23/07/2013 12:03, Conor Mac Aoidh wrote: > > Hi All, > > Been having a problem constructing a suitable pattern to represent some > vector operations in the DAG. Stuff like andx/orx operations where elements > of a vector are anded/ored together. > > My approach thus far has been to extract the sub elements of the vector and > and/or those elements. This is ok for 4 vectors of i32s, but becomes > cumbersome for v16i8s. Example instruction: > > andx $dst $v1 > > Pattern: > > [(set RC:$dst, > (and (i32 (vector_extract(vt VC:$src), 0 ) ), > (and (i32 (vector_extract(vt VC:$src), 1 ) ), > (and (i32 (vector_extract(vt VC:$src), 2 ) ), > (i32 (vector_extract(vt VC:$src), 3 ) ) > ) > ) > ) > )] > > Is there a better way to do this? Tough one... Have you checked out the Horizontal Add lowering code in the x86 backend? See isHorizontalBinOp(...) in llvm/lib/Target/X86/X86ISelLowering.cpp. That might get you going. ;) -Cameron From loupicciano at comcast.net Fri Jul 26 10:51:20 2013 From: loupicciano at comcast.net (Lou Picciano) Date: Fri, 26 Jul 2013 17:51:20 +0000 (UTC) Subject: [LLVMdev] Building libLLVMSupport library - special tricks? Message-ID: <1579126844.1194288.1374861080542.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> Duncan, Yes, I'd been pretty much performing the same diagnosis in parallel - it seems we just don't have that libLLVMSupport library built. Last night, tried the build of clang a few different ways to get that library built - including specifying the --gcc-toolchain option. No Joy! Which of the clang/llvm source trees provides it? or is it in another source package? tools? Is there a Special Sauce which I must cook up? Lou Picciano (have changed topic title to re-focus on the core issue here!) Hi, On 25/07/13 15:47, Lou Picciano wrote: > Duncan, > Many thanks for your comments. > > The core issue we're running into is this: > > $ GCC=/usr/bin/gcc LLVM_CONFIG=/usr/bin/llvm-config make > Compiling utils/TargetInfo.cpp > Linking TargetInfo > ld: fatal: library -lLLVMSupport: not found llvm-config is supposed to say where the libraries are. Take a look at the output of usr/bin/llvm-config --ldflags On my system it outputs -L/usr/local/lib -lrt -ldl -lpthread -lz and indeed libLLVMSupport is there $ ls /usr/local/lib/libLLVMSupport* /usr/local/lib/libLLVMSupport.a > ld: fatal: file processing errors. No output written to TargetInfo > collect2: error: ld returned 1 exit status > > All other gyrations are attempts to shoehorn LLVMSupport into the compile. I've > been sourcing the Makefile and README for hints. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruiu at google.com Fri Jul 26 10:51:48 2013 From: ruiu at google.com (Rui Ueyama) Date: Fri, 26 Jul 2013 10:51:48 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F2B556.3020109@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <51F194AB.10909@codeaurora.org> <51F2B556.3020109@codeaurora.org> Message-ID: I think it should also be enabled for -Os, as long as it always produces a binary equivalent or smaller than one without the flags. On Fri, Jul 26, 2013 at 10:43 AM, Shankar Easwaran wrote: > On 7/26/2013 7:39 AM, Rafael Espíndola wrote: > >> On 25 July 2013 17:24, Rui Ueyama wrote: >> >>> Then how about enable these flags for -O2? I want to hear from other >>> people >>> cc'ed, and I may be too cautious, but I'd hesitate to define a new ELF >>> section if there's other mean already available to achieve the same >>> thing. >>> >> I would probably support doing that first. A small annoyance is that >> the linker requires the --gc-sections option, but most current gnu >> (bfd and gold) versions support that, so we should be fine at least on >> linux (and the driver already collects the distro we are in anyway in >> case we need to change the default for some old distro). >> >> Once that is in, the existing proposals for splitting sections into >> atoms become speed and relocatable object size optimizations. >> > I partly agree. Implementing safe sections would be beneficial if you are > getting third party libraries or system libraries(which are not usually > compiled with -ffunction-sections and -fdata-sections). > > It would be nice to have -ffunction-sections and -fdata-sections the > default at -O2. I am not sure why it was not made the default for all these > years though. > > Thanks > > Shankar Easwaran > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Fri Jul 26 11:45:26 2013 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 26 Jul 2013 11:45:26 -0700 Subject: [LLVMdev] -Os In-Reply-To: <51F2733A.3020300@codeaurora.org> References: <51EE6A33.2010203@mips.com> <3431374581995@web12h.yandex.ru> <51EE756F.80503@mips.com> <6B918A28-DDB7-4D3A-B446-C717B702A3FB@apple.com> <51F2733A.3020300@codeaurora.org> Message-ID: On Jul 26, 2013, at 6:01 AM, Krzysztof Parzyszek wrote: > On 7/23/2013 1:36 PM, Jim Grosbach wrote: >> >> This isn’t just a nitpick. This is exactly why you’re seeing >> differences. The pass managers aren’t always set up the same, for example. >> >> FWIW, I feel your pain. This is a long-standing weakness of our >> infrastructure. > > What was the motivation for this design? Was it to save time by not creating another process, or were there other factors? For having clang be one big executable rather than multiple? Yes, compile time was a very big motivator. Serializing and deserializing the intermediate state, etc, is not lightweight enough. > > The IBM compiler for example, has all of its components in different executables. It makes debugging a lot more convenient (especially when working with large applications built with IPA). > Definitely. That’s why we have the separate components as well that theoretically allow us to do that sort of compartmentalized debugging. Things get complicated in the cases where there are behavioral differences between running things all the way through in clang vs. using the piece-by-piece tools on the IR (clang -emit-llmv —> opt —> llc —> as). Thankfully, that’s not too common. It’s typically sufficient to specify the right triple and maybe one or two options for setting the cpu or something like that to reproduce an issue. When it is a problem, though, it’s a really unpleasant one. It’s also very related to the sorts of information needed to be passed along for proper LTO, so Bill’s work there will enable us to fix this problem, too. -Jim > -K > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnia at mailinator.com Fri Jul 26 13:49:25 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Fri, 26 Jul 2013 13:49:25 -0700 (PDT) Subject: [LLVMdev] LLVM ERROR : Invalid instruction Message-ID: <1374871765162-59856.post@n5.nabble.com> #include #include #include #include using namespace std; void foo(string str) { } int main() { string str="aa"; foo(str); return 0; } 1. clang++ -c -emit-llvm foo.cpp -o foo.ll 2. llc -march=cpp -o foo.ll.cpp foo.ll (at the execution of this command its giving an error as "Invalid Instruction") May I know why is it failing on step 2. ??? -- View this message in context: http://llvm.1065342.n5.nabble.com/LLVM-ERROR-Invalid-instruction-tp59856.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From hfinkel at anl.gov Fri Jul 26 14:03:24 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 26 Jul 2013 16:03:24 -0500 (CDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source In-Reply-To: Message-ID: <713165234.14860488.1374872604203.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > Hi all, > My recent commit r187027 fixed a simple oversight of forgetting to > check for __ppc__ (only checking __powerpc__), which broke my > powerpc-apple-darwin8 stage1 tests, since the system gcc only > provided > __ppc__. I was wondering if this justifies using simpler macros like > > #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) > #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) ...) A general note: Given all of the compiler variance out there regarding whether the 32-bit macros are also defined on the 64-bit systems, I really think that you'll need to arrange these as: #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) ...) #define LLVM_PPC !LLVM_PPC64 && (defined(__ppc__) || defined(__powerpc__) ...) > > I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in > conditionals. > > These proposed standardized macros would only be used in LLVM project > sources; there's no reason to exported them. > The standardized macros would simplify conditionals and make their > use > less error-prone. > > What predefines do other architectures use? Would all uses of these macros be restricted to the PPC backend, or would they appear elsewhere as well? -Hal > > What would be a suitable place for these proposed macros? > include/llvm/Support/Compiler.h? > include/llvm/Support/Arch.h (new)? > > Fang > > -- > David Fang > http://www.csl.cornell.edu/~fang/ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From tonic at nondot.org Fri Jul 26 14:03:53 2013 From: tonic at nondot.org (Tanya Lattner) Date: Fri, 26 Jul 2013 14:03:53 -0700 Subject: [LLVMdev] Announcement: llvm.org changing name servers Message-ID: <617D8BE0-D0BB-4B67-B735-C7AF6C9B270D@nondot.org> Just letting everyone know that llvm.org will be changing name servers over the next few days to a week. You may experience some strangeness as any bugs are worked out but hopefully you won't notice at all. I will send email once the transition is complete. Thanks, Tanya From chandlerc at google.com Fri Jul 26 14:17:48 2013 From: chandlerc at google.com (Chandler Carruth) Date: Fri, 26 Jul 2013 14:17:48 -0700 Subject: [LLVMdev] [RFC] Switching make check to use 'set -o pipefail' In-Reply-To: References: Message-ID: On Fri, Jul 26, 2013 at 10:26 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > > Ok, here is a strong LGTM. =] > > > > Please make the change, and do the following things to aid out-of-tree > > maintainers: > > > > 1) Add a flag to lit and an option to configure/make (I don't care about > > CMake here as that is much less frequently used for out-of-tree work) to > > disable pipefail. > > > > I have just fixed the last failures on windows. I have also added > documentation and the support for disabling pipefail in a directory. > The one thing I have not implemented yet is the configure change. > > The reason is that after thinking a bit about it it looks like > something we don't want to have. What we want to provide is an easy > way for people doing out of tree work to get their tests passing after > an upgrade. We do want upstream tests to fail for them if they, for > example, break opt so that it crashes on exit. This is exactly what > the lit.local.cfg provides. > I can see your argument here and am fine with it on further thought. Thanks. > A new patch is attached. What do you think? > Looks good. I might add in the release notes a quick "paste the following into a lit.local.cfg file in your test subtree to turn this off if you need to" bit? I'm just trying to avoid grumbling by giving a recipe for ignoring this change on old out-of-tree targets that aren't likely to be updated to test correctly. > > > Also, please send a note (in a new thread) to llvmdev when the switch is > > flipped with a reminder about how to disable the new behavior for folks > that > > can't update their test suite. You'll probably want to flip the switch > when > > you have time to track down lots of build bot failures. =D > > OK. > If you can give the recipe in th erelease notes, I'd paste it into the email as well. Thanks for working on this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fang at csl.cornell.edu Fri Jul 26 14:21:00 2013 From: fang at csl.cornell.edu (David Fang) Date: Fri, 26 Jul 2013 17:21:00 -0400 (EDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source In-Reply-To: <713165234.14860488.1374872604203.JavaMail.root@alcf.anl.gov> References: <713165234.14860488.1374872604203.JavaMail.root@alcf.anl.gov> Message-ID: > ----- Original Message ----- >> Hi all, >> My recent commit r187027 fixed a simple oversight of forgetting to >> check for __ppc__ (only checking __powerpc__), which broke my >> powerpc-apple-darwin8 stage1 tests, since the system gcc only >> provided >> __ppc__. I was wondering if this justifies using simpler macros like >> >> #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) >> #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) ...) > > A general note: Given all of the compiler variance out there regarding whether the 32-bit macros are also defined on the 64-bit systems, I really think that you'll need to arrange these as: > > #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) ...) > #define LLVM_PPC !LLVM_PPC64 && (defined(__ppc__) || defined(__powerpc__) ...) or define more clearly: #define LLVM_PPC32 (((defined(__ppc__) || defined(__powerpc__)) && !LLVM_PPC64) #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) or #define LLVM_PPC_ANY (defined(__ppc__) || defined(__powerpc__)) #define LLVM_PPC32 (LLVM_PPC_ANY && !LLVM_PPC64) ? >> I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in >> conditionals. >> >> These proposed standardized macros would only be used in LLVM project >> sources; there's no reason to exported them. >> The standardized macros would simplify conditionals and make their >> use >> less error-prone. >> >> What predefines do other architectures use? > > Would all uses of these macros be restricted to the PPC backend, or > would they appear elsewhere as well? One place I see these predefines outside of the PPC backend is lib/Support/Host.cpp Fang > -Hal > >> >> What would be a suitable place for these proposed macros? >> include/llvm/Support/Compiler.h? >> include/llvm/Support/Arch.h (new)? >> >> Fang -- David Fang http://www.csl.cornell.edu/~fang/ From grosbach at apple.com Fri Jul 26 14:50:33 2013 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 26 Jul 2013 14:50:33 -0700 Subject: [LLVMdev] LLVM ERROR : Invalid instruction In-Reply-To: <1374871765162-59856.post@n5.nabble.com> References: <1374871765162-59856.post@n5.nabble.com> Message-ID: <8CE6C95B-6581-4442-8516-06E001B5FF24@apple.com> Looks like the CPP backend doesn’t know how to deal with exception handling. It’s complaining that it can’t handle a landingpad instruction. -Jim On Jul 26, 2013, at 1:49 PM, Abhinash Jain wrote: > #include > #include > #include > #include > using namespace std; > > void foo(string str) > { > } > int main() > { > string str="aa"; > foo(str); > return 0; > } > > 1. clang++ -c -emit-llvm foo.cpp -o foo.ll > 2. llc -march=cpp -o foo.ll.cpp foo.ll (at the execution of this command > its giving an error as "Invalid Instruction") > > May I know why is it failing on step 2. ??? > > > > -- > View this message in context: http://llvm.1065342.n5.nabble.com/LLVM-ERROR-Invalid-instruction-tp59856.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From micah.villmow at smachines.com Fri Jul 26 15:26:22 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Fri, 26 Jul 2013 22:26:22 +0000 Subject: [LLVMdev] Question about SparseMultiSet Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE6007089092@smi-exchange1.smi.local> Does anyone know if an insertion invalidates the end() iterator? The documentation in ADT/SparseMultiSet.h mentions that removal only invalidates the iterator of the object being removed, but nothing on insertion. My understanding from reading the code seems to me that it doesn't invalidate it, but I want to make sure as I'm not 100% sure here. Thanks, Micah -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Fri Jul 26 15:33:31 2013 From: rkotler at mips.com (reed kotler) Date: Fri, 26 Jul 2013 15:33:31 -0700 Subject: [LLVMdev] floor Message-ID: <51F2F93B.3040102@mips.com> I'm getting some problems because it seems that the compiler is treating "floor" differently from other math library functions like "sin". The Args and RetVal have the parameter and return types marked as void. For mips16, it's important that I be able to know the original signature for floating point functions. In some cases, need to create calls to helper functions based on the signatures. In newer code I've written, I've actually moved this logica to an IR pass and in that case I know for sure. But this part of the code is in ISelLowering code and I rely on getting the proper signature information. I'm looking at llvm now to see how this is occurring but maybe someone just knows. Tia. Reed From rafael.espindola at gmail.com Fri Jul 26 15:35:15 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Fri, 26 Jul 2013 18:35:15 -0400 Subject: [LLVMdev] Regression tests now using pipefail Message-ID: I just committed a change (r187261) to use pipefail when running the regression tests. This means that tests that use pipes like opt .... | FileCheck will now fail if opt fails. This would have avoid some test bitrot in the past. For example, we had tests doing opt -S ... | not grep ... and they were still passing even after the the test itself was not even passing the verifier. If you have out of tree tests that depend on the old behavior, don't worry, it is still available. Just add config.pipefail = False to the lit.local.cfg of the corresponding directory. Cheers, Rafael From omnia at mailinator.com Fri Jul 26 15:48:27 2013 From: omnia at mailinator.com (Abhinash Jain) Date: Fri, 26 Jul 2013 15:48:27 -0700 (PDT) Subject: [LLVMdev] LLVM ERROR : Invalid instruction In-Reply-To: <8CE6C95B-6581-4442-8516-06E001B5FF24@apple.com> References: <1374871765162-59856.post@n5.nabble.com> <8CE6C95B-6581-4442-8516-06E001B5FF24@apple.com> Message-ID: <1374878907385-59865.post@n5.nabble.com> @Jim Grosbach, Is there anyway to resolve it??? -- View this message in context: http://llvm.1065342.n5.nabble.com/LLVM-ERROR-Invalid-instruction-tp59856p59865.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From rkotler at mips.com Fri Jul 26 15:59:35 2013 From: rkotler at mips.com (Reed Kotler) Date: Fri, 26 Jul 2013 15:59:35 -0700 Subject: [LLVMdev] floor In-Reply-To: <51F2F93B.3040102@mips.com> References: <51F2F93B.3040102@mips.com> Message-ID: <51F2FF57.8070906@mips.com> Here is a test case: extern double floor(double); extern double floor_(double); double x = 1.5; double y, y_; void foo() { double y = floor(x); double y_ = floor_(x); } If I compile this for Mips16, it calls the proper helper function for floor_ but not for floor, because the signature for floor in callee info is wrong. Args[0] = void RetTy = void /local/llvmpb_config/install/bin/clang -target mipsel-linux-gnu floor1.c -o floor1.s -mips16 -S -fPIC ..... lw $3, %got(x)($2) lw $4, 0($3) lw $5, 4($3) lw $6, %call16(floor)($2) move $25, $6 move $gp, $2 sw $2, 20 ( $16 ); sw $3, 16 ( $16 ); jalrc $6 sw $3, 36($16) sw $2, 32($16) lw $2, 16 ( $16 ); lw $5, 4($2) lw $4, 0($2) lw $3, 20 ( $16 ); lw $2, %call16(floor_)($3) lw $6, %got(__mips16_call_stub_df_2)($3) move $gp, $3 jalrc $6 .... On 07/26/2013 03:33 PM, reed kotler wrote: > I'm getting some problems because it seems that the compiler is treating > "floor" differently from other math library functions like "sin". > > The Args and RetVal have the parameter and return types marked as void. > > For mips16, it's important that I be able to know the original signature > for floating point functions. > > In some cases, need to create calls to helper functions based on the > signatures. > > In newer code I've written, I've actually moved this logica to an IR > pass and in that case I know for sure. > > But this part of the code is in ISelLowering code and I rely on getting > the proper signature information. > > I'm looking at llvm now to see how this is occurring but maybe someone > just knows. > > Tia. > > Reed From shankare at codeaurora.org Fri Jul 26 16:03:47 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Fri, 26 Jul 2013 18:03:47 -0500 Subject: [LLVMdev] Command line options being put in Target backend libraries Message-ID: <51F30053.8070102@codeaurora.org> Hi, I see a lot of command line options being set in Target backend libraries. The problem with that is if a third party tool links with Target libraries and has a command line option that needs to be processed, the option in the Target libraries will get overridden. $ cd llvm/lib/Target $ grep 'cl::' */*.cpp --> produces lot of such occurences. For example :- libLLVMX86CodeGen.a contains libLLVMX86CodeGen.a:X86RegisterInfo.cpp.o:0000000000000080 b EnableBasePointer I think those command line options would need to be moved to the drivers that are using them, Isnt it ? Am I mistaken ? Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From s at pahtak.org Fri Jul 26 16:03:59 2013 From: s at pahtak.org (Stephen Checkoway) Date: Fri, 26 Jul 2013 19:03:59 -0400 Subject: [LLVMdev] LLVM ERROR : Invalid instruction In-Reply-To: <1374878907385-59865.post@n5.nabble.com> References: <1374871765162-59856.post@n5.nabble.com> <8CE6C95B-6581-4442-8516-06E001B5FF24@apple.com> <1374878907385-59865.post@n5.nabble.com> Message-ID: <7C914165-6AF7-4948-8E39-63E4A44D47D9@pahtak.org> On Jul 26, 2013, at 6:48 PM, Abhinash Jain wrote: > Is there anyway to resolve it??? 1. Teach the cpp backend how to handle it. 2. Compile with -fno-exceptions to turn off exceptions. -- Stephen Checkoway From grosbach at apple.com Fri Jul 26 16:06:21 2013 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 26 Jul 2013 16:06:21 -0700 Subject: [LLVMdev] LLVM ERROR : Invalid instruction In-Reply-To: <7C914165-6AF7-4948-8E39-63E4A44D47D9@pahtak.org> References: <1374871765162-59856.post@n5.nabble.com> <8CE6C95B-6581-4442-8516-06E001B5FF24@apple.com> <1374878907385-59865.post@n5.nabble.com> <7C914165-6AF7-4948-8E39-63E4A44D47D9@pahtak.org> Message-ID: <8EC34178-A47C-47F3-A3C3-1689ABEE7419@apple.com> On Jul 26, 2013, at 4:03 PM, Stephen Checkoway wrote: > > On Jul 26, 2013, at 6:48 PM, Abhinash Jain wrote: > >> Is there anyway to resolve it??? > > > 1. Teach the cpp backend how to handle it. > 2. Compile with -fno-exceptions to turn off exceptions. > Yep. > -- > Stephen Checkoway > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Fri Jul 26 16:19:25 2013 From: tobias at grosser.es (Tobias Grosser) Date: Fri, 26 Jul 2013 16:19:25 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> Message-ID: <51F303FD.6000407@grosser.es> On 07/25/2013 05:09 PM, Quentin Colombet wrote: > Hi, > > I think we have a consensus on how we should report diagnostics now. > For broader uses, the discussion is still open. > > To move forward on the diagnostic part, here is the plan: > - Extend the current handler with a prototype like: > void report(enum Kind, enum Classification, const char* msg) > where > - Kind is the kind of report: InlineAsm, StackSize, Other. > - Classification is Error, Warning. > - msg contains the fall back message to print in case the front-end do not know what to do with the report. Hello Quentin, could you explain how plugins would use your infrastructure? Would it be possible for a plugin to provide (limited) warnings/diagnostics without requiring clang/LLVM to be adapted? Cheers, Tobias From qcolombet at apple.com Fri Jul 26 16:50:10 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Fri, 26 Jul 2013 16:50:10 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <51F303FD.6000407@grosser.es> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> <51F303FD.6000407@grosser.es> Message-ID: Hi Tobias, On Jul 26, 2013, at 4:19 PM, Tobias Grosser wrote: > On 07/25/2013 05:09 PM, Quentin Colombet wrote: >> Hi, >> >> I think we have a consensus on how we should report diagnostics now. >> For broader uses, the discussion is still open. >> >> To move forward on the diagnostic part, here is the plan: >> - Extend the current handler with a prototype like: >> void report(enum Kind, enum Classification, const char* msg) >> where >> - Kind is the kind of report: InlineAsm, StackSize, Other. >> - Classification is Error, Warning. >> - msg contains the fall back message to print in case the front-end do not know what to do with the report. > > Hello Quentin, > > could you explain how plugins would use your infrastructure? I am not familiar with how plug-ins work with LLVM, but let me try to sketch something. With the proposed infrastructure, LLVMContext will supply a callback to report events of special interest to the LLVM clients (clang, etc.). As long as the plugin has access to the LLVMContext, I do not see why it should not be able to report its own events. See the example below. Note that the proposed prototype changed recently to (see Chris’ email): void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) > Would it be possible for a plugin to provide (limited) warnings/diagnostics without requiring clang/LLVM to be adapted? I think so. Typically, to report a warning, I would do from the plug-in: .report(Other, “Whatever”, Warning, “Weird stuff during plug-in analysis.”); For the explanation I quote Chris: > The idea is that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum. Hope this helps! Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Fri Jul 26 16:59:43 2013 From: tobias at grosser.es (Tobias Grosser) Date: Fri, 26 Jul 2013 16:59:43 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> <51F303FD.6000407@grosser.es> Message-ID: <51F30D6F.7030607@grosser.es> On 07/26/2013 04:50 PM, Quentin Colombet wrote: > Hi Tobias, > > On Jul 26, 2013, at 4:19 PM, Tobias Grosser wrote: > >> On 07/25/2013 05:09 PM, Quentin Colombet wrote: >>> Hi, >>> >>> I think we have a consensus on how we should report diagnostics now. >>> For broader uses, the discussion is still open. >>> >>> To move forward on the diagnostic part, here is the plan: >>> - Extend the current handler with a prototype like: >>> void report(enum Kind, enum Classification, const char* msg) >>> where >>> - Kind is the kind of report: InlineAsm, StackSize, Other. >>> - Classification is Error, Warning. >>> - msg contains the fall back message to print in case the front-end do not know what to do with the report. >> >> Hello Quentin, >> >> could you explain how plugins would use your infrastructure? > I am not familiar with how plug-ins work with LLVM, but let me try to sketch something. > > With the proposed infrastructure, LLVMContext will supply a callback to report events of special interest to the LLVM clients (clang, etc.). > As long as the plugin has access to the LLVMContext, I do not see why it should not be able to report its own events. > See the example below. > > Note that the proposed prototype changed recently to (see Chris’ email): > void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) > >> Would it be possible for a plugin to provide (limited) warnings/diagnostics without requiring clang/LLVM to be adapted? > I think so. > Typically, to report a warning, I would do from the plug-in: > .report(Other, “Whatever”, Warning, “Weird stuff during plug-in analysis.”); That sounds good. The last question I have is, for what the 'Kind' parameter is used? Should plugins just use a generic kind 'Other' or would it make sense to support per plugin kinds. Specifically, I wonder how users could enable/disable certain warnings/reports. Tobias From qcolombet at apple.com Fri Jul 26 17:07:13 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Fri, 26 Jul 2013 17:07:13 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <51F30D6F.7030607@grosser.es> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> <51F303FD.6000407@grosser.es> <51F30D6F.7030607@grosser.es> Message-ID: <19E0FB3B-7CF8-449A-917F-C8E089CF1AA8@apple.com> On Jul 26, 2013, at 4:59 PM, Tobias Grosser wrote: > On 07/26/2013 04:50 PM, Quentin Colombet wrote: >> Hi Tobias, >> >> On Jul 26, 2013, at 4:19 PM, Tobias Grosser wrote: >> >>> On 07/25/2013 05:09 PM, Quentin Colombet wrote: >>>> Hi, >>>> >>>> I think we have a consensus on how we should report diagnostics now. >>>> For broader uses, the discussion is still open. >>>> >>>> To move forward on the diagnostic part, here is the plan: >>>> - Extend the current handler with a prototype like: >>>> void report(enum Kind, enum Classification, const char* msg) >>>> where >>>> - Kind is the kind of report: InlineAsm, StackSize, Other. >>>> - Classification is Error, Warning. >>>> - msg contains the fall back message to print in case the front-end do not know what to do with the report. >>> >>> Hello Quentin, >>> >>> could you explain how plugins would use your infrastructure? >> I am not familiar with how plug-ins work with LLVM, but let me try to sketch something. >> >> With the proposed infrastructure, LLVMContext will supply a callback to report events of special interest to the LLVM clients (clang, etc.). >> As long as the plugin has access to the LLVMContext, I do not see why it should not be able to report its own events. >> See the example below. >> >> Note that the proposed prototype changed recently to (see Chris’ email): >> void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) >> >>> Would it be possible for a plugin to provide (limited) warnings/diagnostics without requiring clang/LLVM to be adapted? >> I think so. >> Typically, to report a warning, I would do from the plug-in: >> .report(Other, “Whatever”, Warning, “Weird stuff during plug-in analysis.”); > > That sounds good. > > The last question I have is, for what the 'Kind' parameter is used? Should plugins just use a generic kind 'Other' or would it make sense to support per plugin kinds. Specifically, I wonder how users could enable/disable certain warnings/reports. That is a good question! The basic idea is to add a value in the enum for each event we think it would be nice to have a special handling for (e.g., a new warning group in the front-end). The exact values of the Kind enum is something that needs to be discussed, but anyway, it can be extended in the future. -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Fri Jul 26 17:29:18 2013 From: tobias at grosser.es (Tobias Grosser) Date: Fri, 26 Jul 2013 17:29:18 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <19E0FB3B-7CF8-449A-917F-C8E089CF1AA8@apple.com> References: <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <508C534A-D051-46BA-9C20-8B6D344BA508@apple.com> <3592886B-E580-479A-AF8D-006FD6A4A44A@apple.com> <51F303FD.6000407@grosser.es> <51F30D6F.7030607@grosser.es> <19E0FB3B-7CF8-449A-917F-C8E089CF1AA8@appl! e.com> Message-ID: <51F3145E.20400@grosser.es> On 07/26/2013 05:07 PM, Quentin Colombet wrote: > On Jul 26, 2013, at 4:59 PM, Tobias Grosser wrote: > >> On 07/26/2013 04:50 PM, Quentin Colombet wrote: >>> Hi Tobias, >>> >>> On Jul 26, 2013, at 4:19 PM, Tobias Grosser wrote: >>> >>>> On 07/25/2013 05:09 PM, Quentin Colombet wrote: >>>>> Hi, >>>>> >>>>> I think we have a consensus on how we should report diagnostics now. >>>>> For broader uses, the discussion is still open. >>>>> >>>>> To move forward on the diagnostic part, here is the plan: >>>>> - Extend the current handler with a prototype like: >>>>> void report(enum Kind, enum Classification, const char* msg) >>>>> where >>>>> - Kind is the kind of report: InlineAsm, StackSize, Other. >>>>> - Classification is Error, Warning. >>>>> - msg contains the fall back message to print in case the front-end do not know what to do with the report. >>>> >>>> Hello Quentin, >>>> >>>> could you explain how plugins would use your infrastructure? >>> I am not familiar with how plug-ins work with LLVM, but let me try to sketch something. >>> >>> With the proposed infrastructure, LLVMContext will supply a callback to report events of special interest to the LLVM clients (clang, etc.). >>> As long as the plugin has access to the LLVMContext, I do not see why it should not be able to report its own events. >>> See the example below. >>> >>> Note that the proposed prototype changed recently to (see Chris’ email): >>> void report(enum Kind, StringRef StringData, enum Classification, StringRef msg) >>> >>>> Would it be possible for a plugin to provide (limited) warnings/diagnostics without requiring clang/LLVM to be adapted? >>> I think so. >>> Typically, to report a warning, I would do from the plug-in: >>> .report(Other, “Whatever”, Warning, “Weird stuff during plug-in analysis.”); >> >> That sounds good. >> >> The last question I have is, for what the 'Kind' parameter is used? Should plugins just use a generic kind 'Other' or would it make sense to support per plugin kinds. Specifically, I wonder how users could enable/disable certain warnings/reports. > That is a good question! > > The basic idea is to add a value in the enum for each event we think it would be nice to have a special handling for (e.g., a new warning group in the front-end). > > The exact values of the Kind enum is something that needs to be discussed, but anyway, it can be extended in the future. Alright. I believe to make it real nice for plugins, it would be great to eventually allow them to register new warning groups on demand. However, this is unrelated to the discussion. Having some infrastructure to output generic 'warnings' is a great start. People can then propose patches that make it even nicer for plugins. Thanks for working on this, Tobi From dberlin at dberlin.org Fri Jul 26 17:48:45 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Fri, 26 Jul 2013 17:48:45 -0700 Subject: [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. In-Reply-To: <20130726230756.2F53B352400B@llvm.org> References: <20130726230756.2F53B352400B@llvm.org> Message-ID: Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem wrote: > Author: nadav > Date: Fri Jul 26 18:07:55 2013 > New Revision: 187267 > > URL: http://llvm.org/viewvc/llvm-project?rev=187267&view=rev > Log: > SLP Vectorier: Don't vectorize really short chains because they are > already handled by the SelectionDAG store-vectorizer, which does a better > job in deciding when to vectorize. > > Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp > URL: > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=187267&r1=187266&r2=187267&view=diff > > ============================================================================== > --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original) > +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Jul 26 > 18:07:55 2013 > @@ -898,8 +898,12 @@ int BoUpSLP::getTreeCost() { > DEBUG(dbgs() << "SLP: Calculating cost for tree of size " << > VectorizableTree.size() << ".\n"); > > - if (!VectorizableTree.size()) { > - assert(!ExternalUses.size() && "We should not have any external > users"); > + // Don't vectorize tiny trees. Small load/store chains or consecutive > stores > + // of constants will be vectoried in SelectionDAG in > MergeConsecutiveStores. > + if (VectorizableTree.size() < 3) { > + if (!VectorizableTree.size()) { > + assert(!ExternalUses.size() && "We should not have any external > users"); > + } > return 0; > } > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nrotem at apple.com Fri Jul 26 19:56:50 2013 From: nrotem at apple.com (Nadav Rotem) Date: Fri, 26 Jul 2013 19:56:50 -0700 Subject: [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. In-Reply-To: References: <20130726230756.2F53B352400B@llvm.org> Message-ID: Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. > >> On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem wrote: >> Author: nadav >> Date: Fri Jul 26 18:07:55 2013 >> New Revision: 187267 >> >> URL: http://llvm.org/viewvc/llvm-project?rev=187267&view=rev >> Log: >> SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. > >> >> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=187267&r1=187266&r2=187267&view=diff >> ============================================================================== >> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original) >> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Jul 26 18:07:55 2013 >> @@ -898,8 +898,12 @@ int BoUpSLP::getTreeCost() { >> DEBUG(dbgs() << "SLP: Calculating cost for tree of size " << >> VectorizableTree.size() << ".\n"); >> >> - if (!VectorizableTree.size()) { >> - assert(!ExternalUses.size() && "We should not have any external users"); >> + // Don't vectorize tiny trees. Small load/store chains or consecutive stores >> + // of constants will be vectoried in SelectionDAG in MergeConsecutiveStores. >> + if (VectorizableTree.size() < 3) { >> + if (!VectorizableTree.size()) { >> + assert(!ExternalUses.size() && "We should not have any external users"); >> + } >> return 0; >> } -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohammad.r.haghighat at intel.com Fri Jul 26 20:18:54 2013 From: mohammad.r.haghighat at intel.com (Haghighat, Mohammad R) Date: Sat, 27 Jul 2013 03:18:54 +0000 Subject: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk In-Reply-To: <82364744.14764035.1374852805819.JavaMail.root@alcf.anl.gov> References: <668642398.8242115.1372268637416.JavaMail.root@alcf.anl.gov> <82364744.14764035.1374852805819.JavaMail.root@alcf.anl.gov> Message-ID: <94037EC670985B4A85538143FF8D81CD7263734A@ORSMSX107.amr.corp.intel.com> Hal, I ran the failing tests in the attachment to the bug 16431 on the latest clang trunk (version 3.4 trunk 187225). http://llvm.org/bugs/show_bug.cgi?id=16431 The following tests still fail: Tests in diff: t10236 t12206 t2581 t6734 t7788 t7820 t8069 t9982 All tests in InfLoopInClang: t19193 t22300 t25903 t27872 t33143 t8543 Meanwhile, I'll launch a new run of icFuzz and will post the results later. -moh -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Friday, July 26, 2013 8:33 AM To: Haghighat, Mohammad R Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk ----- Original Message ----- > ----- Original Message ----- > > Great job, Hal! > > > > Sure. I'd be happy to run icFuzz and report the fails once these > > bugs > > are fixed and thereafter whenever people want new runs. Obviously, > > this can be automated, but the problem is that icFuzz is not > > currently open sourced. > > I would be happy to see this open sourced, but I think that we can > work something out regardless. > > Also, once we get the current set of things resolved, I think it > would be useful to test running with: > > -- -O3, LTO (-O4 or -flto), > -- -fslp-vectorize, -fslp-vectorize-aggressive (which are actually > separate optimizations) > -- -ffast-math (if you can do floating point with tolerances, or at > least -ffinite-math-only), -fno-math-errno > (and there are obviously a whole bunch of non-default > code-generation and target options). > > Is it feasible to set up runs with different flags? > > > Once there's a bug in the compiler, there's > > really no limit in the number of failing tests that can be > > generated, so it's more productive to run the generator after the > > previously reported bugs are fixed. > > Agreed. > > > > > We've also seen cases where the results of "clang -O2" are > > different > > on Mac vs. Linux/Windows. > > I recall an issue related to default settings for FP, and differences > with libm implementation. Are there non-floating-point cases? > > > > > Just let me know when you want a new run. > > Will do! Mohammad, Can you please re-run these now? I know that the original loop-vectorizer bugs causing the miscompiles have been fixed, and the others also seem to have been resolved as well. Thanks again, Hal > > -Hal > > > > > Cheers, > > -moh > > > > -----Original Message----- > > From: Hal Finkel [mailto:hfinkel at anl.gov] > > Sent: Wednesday, June 26, 2013 7:35 AM > > To: Haghighat, Mohammad R > > Cc: llvmdev at cs.uiuc.edu; Jim Grosbach > > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly > > generated tests that fail on clang 3.4 trunk > > > > ----- Original Message ----- > > > > > > Hi Moh, > > > > > > > > > Thanks for this. I’m really glad to see the work you’re doing in > > > this > > > area and believe it will be extremely helpful in improving the > > > quality of the compiler. > > > > > > > > > -Jim > > > > > > > > > > > > On Jun 24, 2013, at 4:10 PM, Haghighat, Mohammad R < > > > mohammad.r.haghighat at intel.com > wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > I just submitted a bug report with a package containing 107 small > > > test cases that fail on the latest LLVM/clang 3.4 main trunk > > > (184563). Included are test sources, compilation commands, test > > > input files, and results at –O0 and –O2 when applicable. > > > > > > http://llvm.org/bugs/show_bug.cgi?id=16431 > > > > > > These tests have been automatically generated by an internal tool > > > at > > > Intel, the Intel Compiler fuzzer, icFuzz. The tests are typically > > > very small. For example, for the following simple loop (test > > > t5702) > > > on MacOS X, clang at –O2 generates a binary that crashes: > > > > > > // Test Loop Interchange > > > for (j = 2; j < 76; j++) { > > > for (jm = 1; jm < 30; jm++) { > > > h[j-1][jm-1] = j + 83; > > > } > > > } > > > > > > The tests are put in to two categories > > > - tests that have different runtime outputs when compiled at -O0 > > > and > > > -O2 (this category also includes runtime crashes) > > > - tests that cause infinite loops in the Clang optimizer > > > > > > Many of these failing tests could be due to the same bug, thus a > > > much > > > smaller number of root problems are expected. > > > > > > Any help with triaging these bugs would be highly appreciated. > > > > I've gone through all of the miscompile cases, used bugpoint to > > reduce them, and opened individual PRs for several distinct bugs. > > So > > far we have: PR16455 (loop vectorizer), PR16457 (sccp), PR16460 > > (instcombine). Thanks again for doing this! Do you plan on > > repeating > > this testing on a regular basis? Can it be automated? > > > > -Hal > > > > > > > > Thanks, > > > -moh > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From hfinkel at anl.gov Fri Jul 26 21:39:58 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 26 Jul 2013 23:39:58 -0500 (CDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source In-Reply-To: Message-ID: <817814613.14956146.1374899998447.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > ----- Original Message ----- > >> Hi all, > >> My recent commit r187027 fixed a simple oversight of forgetting > >> to > >> check for __ppc__ (only checking __powerpc__), which broke my > >> powerpc-apple-darwin8 stage1 tests, since the system gcc only > >> provided > >> __ppc__. I was wondering if this justifies using simpler macros > >> like > >> > >> #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) > >> #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) > >> ...) > > > > A general note: Given all of the compiler variance out there > > regarding whether the 32-bit macros are also defined on the 64-bit > > systems, I really think that you'll need to arrange these as: > > > > #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) > > ...) > > #define LLVM_PPC !LLVM_PPC64 && (defined(__ppc__) || > > defined(__powerpc__) ...) > > or define more clearly: > > #define LLVM_PPC32 (((defined(__ppc__) || defined(__powerpc__)) && > !LLVM_PPC64) > #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) > > or > > #define LLVM_PPC_ANY (defined(__ppc__) || defined(__powerpc__)) > #define LLVM_PPC32 (LLVM_PPC_ANY && !LLVM_PPC64) I thought that you had discovered that on Darwin __ppc__, etc. are defined only for 32-bit builds. Is that correct? -Hal > > ? > > >> I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in > >> conditionals. > >> > >> These proposed standardized macros would only be used in LLVM > >> project > >> sources; there's no reason to exported them. > >> The standardized macros would simplify conditionals and make their > >> use > >> less error-prone. > >> > >> What predefines do other architectures use? > > > > Would all uses of these macros be restricted to the PPC backend, or > > would they appear elsewhere as well? > > One place I see these predefines outside of the PPC backend is > lib/Support/Host.cpp > > Fang > > > -Hal > > > >> > >> What would be a suitable place for these proposed macros? > >> include/llvm/Support/Compiler.h? > >> include/llvm/Support/Arch.h (new)? > >> > >> Fang > > > -- > David Fang > http://www.csl.cornell.edu/~fang/ > > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From shea at shealevy.com Fri Jul 26 22:10:15 2013 From: shea at shealevy.com (shea at shealevy.com) Date: Sat, 27 Jul 2013 01:10:15 -0400 Subject: [LLVMdev] =?utf-8?q?Ideal_way_to_build_llvm_and_friends=3F?= Message-ID: <0e58a07ff00fafdba6a54c86e2ef1345@shealevy.com> Hi all, I'm a maintainer for the llvm ecosystem for nixpkgs[1]. Currently only clang and llvm are up-to-date, but I'd like to move to building all of the packages listed on the release page for the latest version (and in particular, have libc++ and compiler RT integrated in with llvm/clang). Right now we're using the CMake build, with llvm and clang being built as separate packages. Due to a quirk in our package manager that's no longer an issue, having them be built together was previously not a good option. I'd like to revamp this to do this the "official" way, but I'm not really sure what that is. CMake or autotools? Is it best to build everything as a single big build (putting clang, lld, lldb and polly in the "tools" subdir and everything else in projects)? Or should I build them all separately? I see libc++ is purposefully excluded from projects/CMakeLists.txt and compiler-rt is built differently from all other projects, why is that? Where does libc++abi fit in with all of this (it's not on the release page)? In general it seems like there are a lot of options and the documentation is not up-to-date everywhere, so I'm a bit lost. Note that while building from SVN is an option if it's really important, ideally we only want released packages, optionally with small patches where needed. Thanks, Shea [1]: http://nixos.org/nixpkgs/ From rkotler at mips.com Fri Jul 26 22:35:20 2013 From: rkotler at mips.com (Reed Kotler) Date: Fri, 26 Jul 2013 22:35:20 -0700 Subject: [LLVMdev] floor In-Reply-To: <51F2FF57.8070906@mips.com> References: <51F2F93B.3040102@mips.com> <51F2FF57.8070906@mips.com> Message-ID: <51F35C18.8060400@mips.com> Consider include float y = 1.0; int main() { float x = sin(y); printf("%e \n", x); } Sin works. /local/llvmpb_config/install/bin/clang -target mipsel-linux-gnu sin1.c -o sin1.s -lm -mips16 -S -fPIC .... lw $3, %got(y)($2) lw $4, 0($3) lw $3, %call16(__mips16_extendsfdf2)($2) move $25, $3 move $gp, $2 sw $2, 36 ( $16 ); sw $3, 32 ( $16 ); jalrc $3 lw $4, 36 ( $16 ); lw $5, %call16(sin)($4) lw $6, %got(__mips16_call_stub_df_2)($4) sw $2, 28 ( $16 ); move $2, $5 lw $4, 28 ( $16 ); move $5, $3 lw $3, 36 ( $16 ); move $gp, $3 jalrc $6 .... On 07/26/2013 03:59 PM, Reed Kotler wrote: > > Here is a test case: > > extern double floor(double); > extern double floor_(double); > > double x = 1.5; > double y, y_; > > void foo() { > > double y = floor(x); > double y_ = floor_(x); > } > > > If I compile this for Mips16, it calls the proper helper function for > floor_ but not for floor, because the signature for floor in callee info > is wrong. Args[0] = void RetTy = void > > /local/llvmpb_config/install/bin/clang -target mipsel-linux-gnu floor1.c > -o floor1.s -mips16 -S -fPIC > > ..... > lw $3, %got(x)($2) > lw $4, 0($3) > lw $5, 4($3) > lw $6, %call16(floor)($2) > move $25, $6 > move $gp, $2 > sw $2, 20 ( $16 ); > sw $3, 16 ( $16 ); > jalrc $6 > sw $3, 36($16) > sw $2, 32($16) > lw $2, 16 ( $16 ); > lw $5, 4($2) > lw $4, 0($2) > lw $3, 20 ( $16 ); > lw $2, %call16(floor_)($3) > lw $6, %got(__mips16_call_stub_df_2)($3) > move $gp, $3 > jalrc $6 > > .... > > > On 07/26/2013 03:33 PM, reed kotler wrote: >> I'm getting some problems because it seems that the compiler is treating >> "floor" differently from other math library functions like "sin". >> >> The Args and RetVal have the parameter and return types marked as void. >> >> For mips16, it's important that I be able to know the original signature >> for floating point functions. >> >> In some cases, need to create calls to helper functions based on the >> signatures. >> >> In newer code I've written, I've actually moved this logica to an IR >> pass and in that case I know for sure. >> >> But this part of the code is in ISelLowering code and I rely on getting >> the proper signature information. >> >> I'm looking at llvm now to see how this is occurring but maybe someone >> just knows. >> >> Tia. >> >> Reed From baldrick at free.fr Sat Jul 27 05:20:18 2013 From: baldrick at free.fr (Duncan Sands) Date: Sat, 27 Jul 2013 14:20:18 +0200 Subject: [LLVMdev] Building libLLVMSupport library - special tricks? In-Reply-To: <1579126844.1194288.1374861080542.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> References: <1579126844.1194288.1374861080542.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net> Message-ID: <51F3BB02.9010607@free.fr> Hi Lou, On 26/07/13 19:51, Lou Picciano wrote: > Duncan, > > Yes, I'd been pretty much performing the same diagnosis in parallel - it seems > we just don't have that libLLVMSupport library built. did you build LLVM? It's part of LLVM not part of clang. It's a very fundamental part of LLVM too, so I don't understand how you can possibly not have it. Where did you get your LLVM installation from? Ciao, Duncan. > > Last night, tried the build of clang a few different ways to get that library > built - including specifying the --gcc-toolchain option. No Joy! > > Which of the clang/llvm source trees provides it? or is it in another source > package? tools? Is there a Special Sauce which I must cook up? > > Lou Picciano > > (have changed topic title to re-focus on the core issue here!) > > > Hi, > > On 25/07/13 15:47, Lou Picciano wrote: > > Duncan, > > Many thanks for your comments. > > > > The core issue we're running into is this: > > > > $ GCC=/usr/bin/gcc LLVM_CONFIG=/usr/bin/llvm-config make > > Compiling utils/TargetInfo.cpp > > Linking TargetInfo > > ld: fatal: library -lLLVMSupport: not found > > llvm-config is supposed to say where the libraries are. Take a look at the > output of > usr/bin/llvm-config --ldflags > > On my system it outputs > -L/usr/local/lib -lrt -ldl -lpthread -lz > and indeed libLLVMSupport is there > $ ls /usr/local/lib/libLLVMSupport* > /usr/local/lib/libLLVMSupport.a > > > ld: fatal: file processing errors. No output written to TargetInfo > > collect2: error: ld returned 1 exit status > > > > All other gyrations are attempts to shoehorn LLVMSupport into the compile. I've > > been sourcing the Makefile and README for hints. > From fang at csl.cornell.edu Sat Jul 27 08:18:46 2013 From: fang at csl.cornell.edu (David Fang) Date: Sat, 27 Jul 2013 11:18:46 -0400 (EDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source In-Reply-To: <817814613.14956146.1374899998447.JavaMail.root@alcf.anl.gov> References: <817814613.14956146.1374899998447.JavaMail.root@alcf.anl.gov> Message-ID: Hi, > ----- Original Message ----- >>> ----- Original Message ----- >>>> Hi all, >>>> My recent commit r187027 fixed a simple oversight of forgetting >>>> to >>>> check for __ppc__ (only checking __powerpc__), which broke my >>>> powerpc-apple-darwin8 stage1 tests, since the system gcc only >>>> provided >>>> __ppc__. I was wondering if this justifies using simpler macros >>>> like >>>> >>>> #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) >>>> #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) >>>> ...) >> or define more clearly: >> >> #define LLVM_PPC32 (((defined(__ppc__) || defined(__powerpc__)) && >> !LLVM_PPC64) >> #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) >> >> or >> >> #define LLVM_PPC_ANY (defined(__ppc__) || defined(__powerpc__)) >> #define LLVM_PPC32 (LLVM_PPC_ANY && !LLVM_PPC64) > > I thought that you had discovered that on Darwin __ppc__, etc. are defined only for 32-bit builds. Is that correct? > > -Hal Ah yes, with apple-gcc-4.0.1 (powerpc) -m32 only defines __ppc__ -m64 only defines __ppc64__ So a more correct set of definitions would be: #define LLVM_PPC32 ((defined(__ppc__) || (defined(__powerpc__) && !defined(__powerpc64__))) #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__)) #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) Fang >>>> I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in >>>> conditionals. >>>> >>>> These proposed standardized macros would only be used in LLVM >>>> project >>>> sources; there's no reason to exported them. >>>> The standardized macros would simplify conditionals and make their >>>> use >>>> less error-prone. >>>> >>>> What predefines do other architectures use? >>> >>> Would all uses of these macros be restricted to the PPC backend, or >>> would they appear elsewhere as well? >> >> One place I see these predefines outside of the PPC backend is >> lib/Support/Host.cpp >> >> Fang >> >>> -Hal >>> >>>> >>>> What would be a suitable place for these proposed macros? >>>> include/llvm/Support/Compiler.h? >>>> include/llvm/Support/Arch.h (new)? >>>> >>>> Fang >> >> >> -- >> David Fang >> http://www.csl.cornell.edu/~fang/ >> >> > > -- David Fang http://www.csl.cornell.edu/~fang/ From dberlin at dberlin.org Sat Jul 27 10:48:12 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Sat, 27 Jul 2013 10:48:12 -0700 Subject: [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. In-Reply-To: References: <20130726230756.2F53B352400B@llvm.org> Message-ID: Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is "no" :) On Fri, Jul 26, 2013 at 7:56 PM, Nadav Rotem wrote: > Hi Daniel, > > Maybe my commit message was not clear. The idea is that the SelectionDAG > store vectorizer can only handle pairs. So, the number three means "more > than a pair". > > Thanks, > Nadav > > Sent from my iPhone > > On Jul 26, 2013, at 17:48, Daniel Berlin wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should add a > shared constant between these two passes, so when one changes, the other > doesn't need to be updated. It would also ensure this bit of info about > what needs to be updated isn't only contained in the comments.. > > On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem wrote: > >> Author: nadav >> Date: Fri Jul 26 18:07:55 2013 >> New Revision: 187267 >> >> URL: http://llvm.org/viewvc/llvm-project?rev=187267&view=rev >> Log: >> SLP Vectorier: Don't vectorize really short chains because they are >> already handled by the SelectionDAG store-vectorizer, which does a >> better job in deciding when to vectorize. > > > >> >> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp >> URL: >> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=187267&r1=187266&r2=187267&view=diff >> >> ============================================================================== >> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original) >> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Jul 26 >> 18:07:55 2013 >> @@ -898,8 +898,12 @@ int BoUpSLP::getTreeCost() { >> DEBUG(dbgs() << "SLP: Calculating cost for tree of size " << >> VectorizableTree.size() << ".\n"); >> >> - if (!VectorizableTree.size()) { >> - assert(!ExternalUses.size() && "We should not have any external >> users"); >> + // Don't vectorize tiny trees. Small load/store chains or consecutive >> stores >> + // of constants will be vectoried in SelectionDAG in >> MergeConsecutiveStores. >> + if (VectorizableTree.size() < 3) { >> + if (!VectorizableTree.size()) { >> + assert(!ExternalUses.size() && "We should not have any external >> users"); >> + } >> return 0; >> } >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nrotem at apple.com Sat Jul 27 16:31:55 2013 From: nrotem at apple.com (Nadav Rotem) Date: Sat, 27 Jul 2013 16:31:55 -0700 Subject: [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. In-Reply-To: References: <20130726230756.2F53B352400B@llvm.org> Message-ID: <19398E76-5898-4930-82B8-CCAC3CE3DBE6@apple.com> Hi Daniel, Thanks for the review! + // The SelectionDAG vectorizer can only handle pairs (trees of height = 2). r187316. -Nadav On Jul 27, 2013, at 10:48 AM, Daniel Berlin wrote: > Hi Nadav, > > Okay. > > 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 > > 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? > > I would posit, based on experience, the answer is "no" :) > > > > On Fri, Jul 26, 2013 at 7:56 PM, Nadav Rotem wrote: > Hi Daniel, > > Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". > > Thanks, > Nadav > > Sent from my iPhone > > On Jul 26, 2013, at 17:48, Daniel Berlin wrote: > >> Hey Nadav, >> I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. >> >> On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem wrote: >> Author: nadav >> Date: Fri Jul 26 18:07:55 2013 >> New Revision: 187267 >> >> URL: http://llvm.org/viewvc/llvm-project?rev=187267&view=rev >> Log: >> SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. >> >> >> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=187267&r1=187266&r2=187267&view=diff >> ============================================================================== >> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original) >> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Jul 26 18:07:55 2013 >> @@ -898,8 +898,12 @@ int BoUpSLP::getTreeCost() { >> DEBUG(dbgs() << "SLP: Calculating cost for tree of size " << >> VectorizableTree.size() << ".\n"); >> >> - if (!VectorizableTree.size()) { >> - assert(!ExternalUses.size() && "We should not have any external users"); >> + // Don't vectorize tiny trees. Small load/store chains or consecutive stores >> + // of constants will be vectoried in SelectionDAG in MergeConsecutiveStores. >> + if (VectorizableTree.size() < 3) { >> + if (!VectorizableTree.size()) { >> + assert(!ExternalUses.size() && "We should not have any external users"); >> + } >> return 0; >> } -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Sat Jul 27 17:47:35 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Sat, 27 Jul 2013 17:47:35 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51E74E72.5000902@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51E74E72.5000902@gmail.com> Message-ID: <51F46A27.2000502@gmail.com> Hi, Sean: I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change to the IPO pass-ordering for now. However, when I make a minor change to populateLTOPassManager() by separating module-pass and non-module-passes, I saw quite a few performance difference, most of them are degradations. Attacking these degradations one by one in a piecemeal manner is wasting time. We might as well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time, and hopefully once for all. In order to repair the image of being a liar, I post some preliminary result in this cozy Saturday afternoon which I normally denote to daydreaming :-) So far I only measure the result of MultiSource benchmarks on my iMac (late 2012 model), and the command to run the benchmark is "make TEST=simple report OPTFLAGS='-O3 -flto'". In terms of execution-time, some degrade, but more improve, few of them are quite substantial. User-time is used for comparison. I measure the result twice, they are basically very stable. As far as I can tell from the result, the proposed pass-ordering is basically toward good change. Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase (see the patch) with original populateLTOPassManager() for both IPO and postIPO, I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc" (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance to figure out why this combination improves this benchmark this much. In teams of compile-time, the result reports my change improve the compile time by about 2x, which is non-sense. I guess test-script doesn't count link-time. The new pass ordering Pre-IPO, IPO, and PostIPO are defined by populate{PreIPO|IPO|PostIPO}PassMgr(). I will discuss with Andy next Monday in order to be consistent with the pass-ordering design he is envisioning, and measure more benchmarks then post the patch and result to the community for discussion and approval. Thanks Shuxin On 7/17/13 7:09 PM, Shuxin Yang wrote: > Andy and I briefly discussed this the other day, we have not yet got > chance to list a detailed pass order > for the pre- and post- IPO scalar optimizations. > > This is wish-list in our mind: > > pre-IPO: based on the ordering he propose, get rid of the inlining > (or just inline tiny func), get rid of > all loop xforms... > > post-IPO: get rid of inlining, or maybe we still need it, only perform > the inling to to callee which now become tiny. > enable the loop xforms. > > The SCC pass manager seems to be important inling, no > matter how the inling looks like in the future, > I think the passmanager is still useful for scalar > opt. It enable us to achieve cheap inter-procedural > opt hands down in the sense that we can optimize > callee, analyze it, and feedback the detailed whatever > info back to caller (say info like "the callee > already return constant 5", the "callee return value in 5-10", > and such info is difficult to obtain and IPO stage, as > it can not afford to take such closer look. > > I think it is too early to discuss the pre-IPO and post-IPO thing, let > us focus on what Andy is proposing. > > > On 7/17/13 6:04 PM, Sean Silva wrote: >> There seems to be a lot of interest recently in LTO. How do you see >> the situation of splitting the IR passes between per-TU processing >> and multi-TU ("link time") processing? >> >> -- Sean Silva >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- name exec_was exec_is exec_diff ------------------------------------------- ---------- ---------- ---------------- Benchmarks/TSVC/Symbolics-flt/Symbolics-flt 1.4634 0.684 -53.259532595326 Benchmarks/MiBench/security-sha/security-sh 0.0199 0.0128 -35.678391959799 Benchmarks/mediabench/adpcm/rawcaudio/rawca 0.0034 0.0025 -26.470588235294 Benchmarks/Prolangs-C/agrep/agrep 0.0032 0.0025 -21.875 Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg 0.0032 0.0025 -21.875 Benchmarks/Olden/perimeter/perimeter 0.1747 0.1422 -18.603319977103 Benchmarks/mediabench/adpcm/rawdaudio/rawda 0.0022 0.0018 -18.181818181818 Benchmarks/FreeBench/fourinarow/fourinarow 0.2457 0.2018 -17.867317867317 Benchmarks/Prolangs-C++/family/family 0.0006 0.0005 -16.666666666666 Applications/ALAC/encode/alacconvert-encode 0.0314 0.0264 -15.923566878980 Benchmarks/MiBench/security-rijndael/securi 0.0243 0.0207 -14.814814814814 Benchmarks/mediabench/gsm/toast/toast 0.0174 0.0149 -14.367816091954 Benchmarks/Prolangs-C++/shapes/shapes 0.0007 0.0006 -14.285714285714 Benchmarks/Prolangs-C/bison/mybison 0.0021 0.0018 -14.285714285714 Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl 2.1248 1.8634 -12.302334337349 Benchmarks/McCat/03-testtrie/testtrie 0.0092 0.0081 -11.956521739130 Applications/treecc/treecc 0.0009 0.0008 -11.111111111111 Benchmarks/Prolangs-C/cdecl/cdecl 0.0009 0.0008 -11.111111111111 Benchmarks/TSVC/NodeSplitting-flt/NodeSplit 2.3019 2.0529 -10.817151049133 Benchmarks/MiBench/network-patricia/network 0.0647 0.0581 -10.200927357032 Benchmarks/McCat/09-vor/vor 0.0816 0.0735 -9.9264705882353 Benchmarks/MallocBench/gs/gs 0.029 0.0262 -9.6551724137931 Benchmarks/MiBench/telecomm-CRC32/telecomm- 0.1227 0.1122 -8.5574572127139 Benchmarks/TSVC/ControlLoops-flt/ControlLoo 1.5978 1.4648 -8.3239454249593 Applications/hexxagon/hexxagon 4.9682 4.566 -8.0954872992230 Benchmarks/Prolangs-C++/simul/simul 0.0043 0.004 -6.9767441860465 Benchmarks/TSVC/Reductions-dbl/Reductions-d 2.3107 2.1611 -6.4742285887393 Benchmarks/TSVC/LinearDependence-dbl/Linear 2.5083 2.3536 -6.1675238209145 Benchmarks/TSVC/LinearDependence-flt/Linear 2.0396 1.9215 -5.7903510492253 Benchmarks/TSVC/ControlLoops-dbl/ControlLoo 2.1258 2.0077 -5.5555555555555 Benchmarks/MiBench/consumer-lame/consumer-l 0.1355 0.1285 -5.1660516605166 Benchmarks/Trimaran/enc-rc4/enc-rc4 0.6262 0.5967 -4.7109549664643 Applications/oggenc/oggenc 0.077 0.0735 -4.5454545454545 Benchmarks/BitBench/uuencode/uuencode 0.0119 0.0114 -4.2016806722689 Benchmarks/Prolangs-C/unix-smail/unix-smail 0.0024 0.0023 -4.1666666666666 Benchmarks/TSVC/InductionVariable-dbl/Induc 2.9528 2.8362 -3.9487943646708 Benchmarks/TSVC/NodeSplitting-dbl/NodeSplit 2.7203 2.6209 -3.6540087490350 Applications/d/make_dparser 0.0174 0.0168 -3.4482758620689 Applications/lambda-0.1.3/lambda 2.6777 2.5864 -3.4096426037270 Applications/viterbi/viterbi 1.8383 1.777 -3.3346026219877 Benchmarks/MiBench/telecomm-gsm/telecomm-gs 0.1172 0.1134 -3.2423208191126 Benchmarks/McCat/18-imp/imp 0.0415 0.0402 -3.1325301204819 Benchmarks/MiBench/automotive-bitcount/auto 0.0518 0.0502 -3.0888030888030 Benchmarks/FreeBench/analyzer/analyzer 0.0333 0.0323 -3.0030030030030 Benchmarks/Prolangs-C++/city/city 0.0036 0.0035 -2.7777777777777 Benchmarks/TSVC/Reductions-flt/Reductions-f 4.4121 4.2942 -2.6721969130346 Benchmarks/Olden/tsp/tsp 0.5126 0.5011 -2.2434646898166 Benchmarks/Trimaran/enc-pc1/enc-pc1 0.1574 0.154 -2.1601016518424 Benchmarks/TSVC/ControlFlow-flt/ControlFlow 2.351 2.3012 -2.1182475542322 Benchmarks/MiBench/network-dijkstra/network 0.0296 0.029 -2.0270270270270 Benchmarks/Ptrdist/bc/bc 0.4764 0.4674 -1.8891687657430 Benchmarks/Prolangs-C/gnugo/gnugo 0.028 0.0275 -1.7857142857142 Benchmarks/VersaBench/dbms/dbms 0.8088 0.7949 -1.7185954500494 Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 3.7015 3.6379 -1.7182223422936 Benchmarks/Olden/health/health 0.1787 0.1757 -1.6787912702854 Benchmarks/VersaBench/bmm/bmm 1.4694 1.4455 -1.6265142234925 Benchmarks/McCat/01-qbsort/qbsort 0.0876 0.0862 -1.5981735159817 Applications/ClamAV/clamscan 0.094 0.0925 -1.5957446808510 Benchmarks/McCat/17-bintr/bintr 0.0666 0.0658 -1.2012012012012 Benchmarks/MiBench/automotive-susan/automot 0.0312 0.0309 -0.9615384615384 Benchmarks/TSVC/LoopRerolling-dbl/LoopRerol 2.7783 2.7524 -0.9322247417485 Benchmarks/SciMark2-C/scimark2 22.2684 22.0824 -0.8352643207414 Benchmarks/mediabench/g721/g721encode/encod 0.0403 0.04 -0.7444168734491 Benchmarks/ASC_Sequoia/AMGmk/AMGmk 5.0381 5.0033 -0.6907365872054 Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDa 2.3246 2.3089 -0.6753850124752 Applications/sgefa/sgefa 0.0962 0.0956 -0.6237006237006 Applications/minisat/minisat 4.021 4.0023 -0.4650584431733 Benchmarks/llubenchmark/llu 2.8277 2.8147 -0.4597375959260 Benchmarks/TSVC/Expansion-flt/Expansion-flt 1.8036 1.7961 -0.4158349966733 Applications/aha/aha 1.1345 1.1299 -0.4054649625385 Benchmarks/TSVC/Expansion-dbl/Expansion-dbl 2.5986 2.5886 -0.3848225967828 Benchmarks/PAQ8p/paq8p 33.6364 33.5149 -0.3612158257126 Benchmarks/FreeBench/neural/neural 0.1771 0.1765 -0.3387916431394 Benchmarks/Ptrdist/ft/ft 0.6569 0.6549 -0.3044603440401 Benchmarks/Trimaran/enc-3des/enc-3des 1.3386 1.3354 -0.2390557298670 Benchmarks/VersaBench/ecbdes/ecbdes 1.5638 1.5623 -0.0959201943982 Benchmarks/TSVC/Recurrences-dbl/Recurrences 2.8128 2.8102 -0.0924345847554 Benchmarks/Trimaran/netbench-crc/netbench-c 0.5665 0.566 -0.0882612533098 Benchmarks/Prolangs-C++/life/life 1.826 1.8244 -0.0876232201533 Benchmarks/TSVC/ControlFlow-dbl/ControlFlow 2.6993 2.6973 -0.0740932834438 Benchmarks/TSVC/Packing-flt/Packing-flt 2.6722 2.6716 -0.0224534091759 Benchmarks/TSVC/Searching-flt/Searching-flt 3.3246 3.324 -0.0180472838837 Benchmarks/TSVC/Searching-dbl/Searching-dbl 3.3563 3.3558 -0.0148973572088 Benchmarks/TSVC/Equivalencing-flt/Equivalen 0.9735 0.9734 -0.0102722136620 Applications/Burg/burg 0.0008 0.0008 0.0 Applications/hbd/hbd 0.0018 0.0018 0.0 Benchmarks/BitBench/uudecode/uudecode 0.0243 0.0243 0.0 Benchmarks/McCat/04-bisect/bisect 0.0696 0.0696 0.0 Benchmarks/McCat/05-eks/eks 0.0021 0.0021 0.0 Benchmarks/McCat/15-trie/trie 0.0008 0.0008 0.0 Benchmarks/MiBench/consumer-jpeg/consumer-j 0.0028 0.0028 0.0 Benchmarks/MiBench/office-ispell/office-isp 0.0006 0.0006 0.0 Benchmarks/MiBench/security-blowfish/securi 0.0007 0.0007 0.0 Benchmarks/MiBench/telecomm-adpcm/telecomm- 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/NP/np 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/deriv1/deriv1 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/deriv2/deriv2 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/employ/employ 0.0038 0.0038 0.0 Benchmarks/Prolangs-C++/fsm/fsm 0.0005 0.0005 0.0 Benchmarks/Prolangs-C++/garage/garage 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/ocean/ocean 0.042 0.042 0.0 Benchmarks/Prolangs-C++/office/office 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/trees/trees 0.0006 0.0006 0.0 Benchmarks/Prolangs-C++/vcirc/vcirc 0.0005 0.0005 0.0 Benchmarks/Prolangs-C/allroots/allroots 0.0006 0.0006 0.0 Benchmarks/Prolangs-C/compiler/compiler 0.0006 0.0006 0.0 Benchmarks/Prolangs-C/fixoutput/fixoutput 0.0006 0.0006 0.0 Benchmarks/Prolangs-C/football/football 0.0005 0.0005 0.0 Benchmarks/Prolangs-C/loader/loader 0.0006 0.0006 0.0 Benchmarks/Prolangs-C/simulator/simulator 0.0006 0.0006 0.0 Benchmarks/Prolangs-C/unix-tbl/unix-tbl 0.0006 0.0006 0.0 Benchmarks/TSVC/Recurrences-flt/Recurrences 2.7172 2.7173 0.00368025909023 Benchmarks/TSVC/StatementReordering-dbl/Sta 2.5547 2.555 0.01174306180765 Benchmarks/Trimaran/enc-md5/enc-md5 1.2119 1.2126 0.05776054129878 Benchmarks/MiBench/automotive-basicmath/aut 0.1698 0.1699 0.05889281507655 Benchmarks/ASC_Sequoia/IRSmk/IRSmk 2.6607 2.6626 0.07140977938136 Benchmarks/Fhourstones-3.1/fhourstones3.1 0.7427 0.7433 0.08078632018310 Benchmarks/TSVC/LoopRestructuring-dbl/LoopR 2.9857 2.9883 0.08708175637204 Benchmarks/Olden/em3d/em3d 2.0241 2.0262 0.10374981473247 Benchmarks/TSVC/LoopRerolling-flt/LoopRerol 2.0889 2.0914 0.11968021446694 Benchmarks/TSVC/Packing-dbl/Packing-dbl 2.8154 2.8196 0.14917951268025 Benchmarks/BitBench/five11/five11 4.038 4.0448 0.16840019811788 Benchmarks/Olden/treeadd/treeadd 0.1588 0.1591 0.18891687657430 Benchmarks/TSVC/IndirectAddressing-flt/Indi 2.1573 2.1615 0.19468780419969 Benchmarks/Ptrdist/anagram/anagram 0.6629 0.6644 0.22627847337455 Benchmarks/TSVC/StatementReordering-flt/Sta 1.8867 1.892 0.28091376477446 Benchmarks/TSVC/IndirectAddressing-dbl/Indi 2.6113 2.6189 0.29104277562899 Benchmarks/FreeBench/pifft/pifft 0.0636 0.0638 0.31446540880501 Benchmarks/Prolangs-C++/primes/primes 0.1916 0.1923 0.36534446764092 Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDa 1.2514 1.2567 0.42352565127056 Benchmarks/Olden/power/power 0.7097 0.7129 0.45089474425813 Benchmarks/ASCI_Purple/SMG2000/smg2000 1.4904 1.4972 0.45625335480408 Applications/lemon/lemon 0.6774 0.6805 0.45763212282255 Benchmarks/MiBench/telecomm-FFT/telecomm-ff 0.0209 0.021 0.47846889952154 Benchmarks/7zip/7zip-benchmark 5.9521 5.9811 0.48722299692545 Benchmarks/TSVC/CrossingThresholds-dbl/Cros 2.6449 2.6578 0.48773110514575 Applications/SPASS/SPASS 5.9442 5.9748 0.51478752397294 Benchmarks/MallocBench/cfrac/cfrac 1.2635 1.2704 0.54610209734862 Benchmarks/Ptrdist/ks/ks 0.7054 0.7117 0.89311029203288 Benchmarks/MallocBench/espresso/espresso 0.3836 0.3871 0.91240875912408 Applications/JM/lencod/lencod 3.7442 3.7859 1.11372255755568 Benchmarks/TSVC/Equivalencing-dbl/Equivalen 1.3717 1.3881 1.19559670481884 Benchmarks/Olden/bh/bh 0.6255 0.633 1.1990407673861 Benchmarks/VersaBench/8b10b/8b10b 2.8968 2.9416 1.5465341066004 Benchmarks/BitBench/drop3/drop3 0.174 0.1768 1.60919540229886 Benchmarks/McCat/12-IOtest/iotest 0.1223 0.1243 1.63532297628781 Applications/spiff/spiff 1.629 1.6558 1.6451810926949 Benchmarks/TSVC/CrossingThresholds-flt/Cros 2.0682 2.1028 1.67295232569383 Benchmarks/Olden/voronoi/voronoi 0.1569 0.1596 1.72084130019119 Applications/lua/lua 14.0101 14.2671 1.83439090370518 Benchmarks/nbench/nbench 5.4638 5.568 1.90709762436399 Applications/sqlite3/sqlite3 2.3871 2.4339 1.960537891165 Applications/ALAC/decode/alacconvert-decode 0.0152 0.0155 1.97368421052632 Benchmarks/Trimaran/netbench-url/netbench-u 2.7548 2.8112 2.04733555975025 Benchmarks/Olden/bisort/bisort 0.3265 0.3332 2.05206738131699 Benchmarks/Fhourstones/fhourstones 0.6284 0.6419 2.14831317632083 Applications/JM/ldecod/ldecod 0.0543 0.0556 2.39410681399631 Benchmarks/TSVC/LoopRestructuring-flt/LoopR 2.2302 2.2848 2.4482109227872 Benchmarks/FreeBench/mason/mason 0.1085 0.1113 2.58064516129032 Benchmarks/Bullet/bullet 3.0174 3.0968 2.63140452044807 Applications/SIBsim4/SIBsim4 1.8364 1.8853 2.66281855804835 Benchmarks/McCat/08-main/main 0.0138 0.0142 2.89855072463769 Applications/siod/siod 1.8991 1.9696 3.71228476646833 Benchmarks/FreeBench/distray/distray 0.0793 0.0829 4.53972257250947 Benchmarks/NPB-serial/is/is 4.6101 4.8299 4.76779245569511 Applications/kimwitu++/kc 0.0266 0.0279 4.88721804511279 Benchmarks/Olden/mst/mst 0.0551 0.0589 6.89655172413793 Benchmarks/Ptrdist/yacr2/yacr2 0.5277 0.5663 7.31476217547851 Benchmarks/VersaBench/beamformer/beamformer 0.6497 0.7015 7.97291057411112 Benchmarks/sim/sim 2.6061 2.8147 8.00429760945475 Benchmarks/FreeBench/pcompress2/pcompress2 0.101 0.1097 8.61386138613861 Benchmarks/mafft/pairlocalalign 16.7374 18.4048 9.9621207594967 Benchmarks/MiBench/office-stringsearch/offi 0.001 0.0011 10.0 Benchmarks/TSVC/InductionVariable-flt/Induc 2.0788 2.2966 10.4771983836829 Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2d 0.0076 0.0084 10.5263157894737 Benchmarks/MiBench/consumer-typeset/consume 0.0943 0.1053 11.6648992576882 Benchmarks/tramp3d-v4/tramp3d-v4 0.1849 0.208 12.493239588967 Benchmarks/Prolangs-C++/objects/objects 0.0005 0.0006 20.0 Benchmarks/Prolangs-C/TimberWolfMC/timberwo 0.0005 0.0006 20.0 Benchmarks/Prolangs-C/assembler/assembler 0.0005 0.0006 20.0 -------------- next part -------------- Index: include/llvm/Transforms/IPO/PassManagerBuilder.h =================================================================== --- include/llvm/Transforms/IPO/PassManagerBuilder.h (revision 187135) +++ include/llvm/Transforms/IPO/PassManagerBuilder.h (working copy) @@ -132,8 +132,14 @@ /// populateModulePassManager - This sets up the primary pass manager. void populateModulePassManager(PassManagerBase &MPM); - void populateLTOPassManager(PassManagerBase &PM, bool Internalize, - bool RunInliner, bool DisableGVNLoadPRE = false); + + /// setup passes for Pre-IPO phase + void populatePreIPOPassMgr(PassManagerBase &MPM); + + void populateIPOPassManager(PassManagerBase &PM, bool Internalize, + bool RunInliner); + + void populatePostIPOPM(PassManagerBase &PM); }; /// Registers a function for adding a standard set of passes. This should be Index: include/llvm/Transforms/IPO.h =================================================================== --- include/llvm/Transforms/IPO.h (revision 187135) +++ include/llvm/Transforms/IPO.h (working copy) @@ -89,6 +89,7 @@ /// threshold given here. Pass *createFunctionInliningPass(); Pass *createFunctionInliningPass(int Threshold); +Pass *createTinyFuncInliningPass(); //===----------------------------------------------------------------------===// /// createAlwaysInlinerPass - Return a new pass object that inlines only Index: tools/lto/LTOCodeGenerator.cpp =================================================================== --- tools/lto/LTOCodeGenerator.cpp (revision 187135) +++ tools/lto/LTOCodeGenerator.cpp (working copy) @@ -412,11 +412,12 @@ // Enabling internalize here would use its AllButMain variant. It // keeps only main if it exists and does nothing for libraries. Instead // we create the pass ourselves with the symbol list provided by the linker. - if (!DisableOpt) - PassManagerBuilder().populateLTOPassManager(passes, - /*Internalize=*/false, - !DisableInline, - DisableGVNLoadPRE); + if (!DisableOpt) { + PassManagerBuilder().populateIPOPassManager(passes, + /*Internalize=*/false, + !DisableInline); + PassManagerBuilder().populatePostIPOPM(passes); + } // Make sure everything is still good. passes.add(createVerifierPass()); Index: tools/opt/opt.cpp =================================================================== --- tools/opt/opt.cpp (revision 187135) +++ tools/opt/opt.cpp (working copy) @@ -104,6 +104,11 @@ cl::desc("Include the standard compile time optimizations")); static cl::opt +StandardPreIPOOpts("std-preipo-opts", + cl::desc("Include the standard pre-IPO optimizations")); + + +static cl::opt StandardLinkOpts("std-link-opts", cl::desc("Include the standard link time optimizations")); @@ -470,6 +475,23 @@ Builder.populateModulePassManager(PM); } +static void AddPreIPOCompilePasses(PassManagerBase &PM) { + PM.add(createVerifierPass()); // Verify that input is correct + + // If the -strip-debug command line option was specified, do it. + if (StripDebug) + addPass(PM, createStripSymbolsPass(true)); + + if (DisableOptimizations) return; + + // -std-preipo-opts adds the same module passes as -O3. + PassManagerBuilder Builder; + if (!DisableInline) + Builder.Inliner = createTinyFuncInliningPass(); + Builder.OptLevel = 3; + Builder.populatePreIPOPassMgr(PM); +} + static void AddStandardLinkPasses(PassManagerBase &PM) { PM.add(createVerifierPass()); // Verify that input is correct @@ -480,8 +502,9 @@ if (DisableOptimizations) return; PassManagerBuilder Builder; - Builder.populateLTOPassManager(PM, /*Internalize=*/ !DisableInternalize, + Builder.populateIPOPassManager(PM, /*Internalize=*/ !DisableInternalize, /*RunInliner=*/ !DisableInline); + Builder.populatePostIPOPM(PM); } //===----------------------------------------------------------------------===// @@ -778,6 +801,12 @@ StandardCompileOpts = false; } + // If -std-preipo-opts was specified at the end of the pass list, add them. + if (StandardPreIPOOpts) { + AddPreIPOCompilePasses(Passes); + StandardPreIPOOpts = false; + } + if (StandardLinkOpts) { AddStandardLinkPasses(Passes); StandardLinkOpts = false; Index: tools/bugpoint/bugpoint.cpp =================================================================== --- tools/bugpoint/bugpoint.cpp (revision 187135) +++ tools/bugpoint/bugpoint.cpp (working copy) @@ -169,8 +169,9 @@ if (StandardLinkOpts) { PassManagerBuilder Builder; - Builder.populateLTOPassManager(PM, /*Internalize=*/true, + Builder.populateIPOPassManager(PM, /*Internalize=*/true, /*RunInliner=*/true); + Builder.populatePostIPOPM(PM); } if (OptLevelO1 || OptLevelO2 || OptLevelO3) { Index: lib/Transforms/IPO/PassManagerBuilder.cpp =================================================================== --- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 187135) +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy) @@ -294,10 +294,78 @@ addExtensionsToPM(EP_OptimizerLast, MPM); } -void PassManagerBuilder::populateLTOPassManager(PassManagerBase &PM, +void PassManagerBuilder::populatePreIPOPassMgr(PassManagerBase &MPM) { + // If all optimizations are disabled, just run the always-inline pass. + if (OptLevel == 0) { + if (Inliner) { + MPM.add(Inliner); + Inliner = 0; + } + return; + } + + bool EnableLightWeightIPO = (OptLevel > 1); + + // Add LibraryInfo if we have some. + if (LibraryInfo) MPM.add(new TargetLibraryInfo(*LibraryInfo)); + addInitialAliasAnalysisPasses(MPM); + + // Start of CallGraph SCC passes. + { + if (EnableLightWeightIPO) { + MPM.add(createPruneEHPass()); // Remove dead EH info + if (Inliner) { + MPM.add(Inliner); + Inliner = 0; + } + MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args + } + + // Start of function pass. + { + if (UseNewSROA) + MPM.add(createSROAPass(/*RequiresDomTree*/ false)); + else + MPM.add(createScalarReplAggregatesPass(-1, false)); + + MPM.add(createEarlyCSEPass()); // Catch trivial redundancies + MPM.add(createJumpThreadingPass()); // Thread jumps. + MPM.add(createCorrelatedValuePropagationPass());// Propagate conditionals + MPM.add(createCFGSimplificationPass()); // Merge & remove BBs + MPM.add(createInstructionCombiningPass()); // Combine silly seq's + MPM.add(createReassociatePass()); // Reassociate expressions + MPM.add(createLoopRotatePass()); // Rotate Loop + MPM.add(createLICMPass()); // Hoist loop invariants + MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars + MPM.add(createLoopIdiomPass()); // Recognize idioms like memset. + MPM.add(createLoopDeletionPass()); // Delete dead loops + + MPM.add(createGVNPass()); // Remove redundancies + MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset + MPM.add(createSCCPPass()); // Constant prop with SCCP + + MPM.add(createDeadStoreEliminationPass()); // Delete dead stores + MPM.add(createAggressiveDCEPass()); // Delete dead instructions + MPM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs + + MPM.add(createTailCallEliminationPass()); // Eliminate tail calls + } + + // End of CallGraph SCC passes. + } + + if (EnableLightWeightIPO) { + MPM.add(createGlobalOptimizerPass()); // Optimize out global vars + MPM.add(createIPSCCPPass()); // IP SCCP + MPM.add(createDeadArgEliminationPass()); // Dead argument elimination + MPM.add(createGlobalDCEPass()); // Remove dead fns and globals. + MPM.add(createConstantMergePass()); // Merge dup global constants + } +} + +void PassManagerBuilder::populateIPOPassManager(PassManagerBase &PM, bool Internalize, - bool RunInliner, - bool DisableGVNLoadPRE) { + bool RunInliner) { // Provide AliasAnalysis services for optimizations. addInitialAliasAnalysisPasses(PM); @@ -325,15 +393,9 @@ // Remove unused arguments from functions. PM.add(createDeadArgEliminationPass()); - // Reduce the code after globalopt and ipsccp. Both can open up significant - // simplification opportunities, and both can propagate functions through - // function pointers. When this happens, we often have to resolve varargs - // calls, etc, so let instcombine do this. - PM.add(createInstructionCombiningPass()); - // Inline small functions if (RunInliner) - PM.add(createFunctionInliningPass()); + PM.add(createFunctionInliningPass(255)); PM.add(createPruneEHPass()); // Remove dead EH info. @@ -346,35 +408,98 @@ // transform it to pass arguments by value instead of by reference. PM.add(createArgumentPromotionPass()); - // The IPO passes may leave cruft around. Clean up after them. - PM.add(createInstructionCombiningPass()); - PM.add(createJumpThreadingPass()); - // Break up allocas - if (UseNewSROA) - PM.add(createSROAPass()); - else - PM.add(createScalarReplAggregatesPass()); - // Run a few AA driven optimizations here and now, to cleanup the code. PM.add(createFunctionAttrsPass()); // Add nocapture. PM.add(createGlobalsModRefPass()); // IP alias analysis. +} - PM.add(createLICMPass()); // Hoist loop invariants. - PM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies. - PM.add(createMemCpyOptPass()); // Remove dead memcpys. - // Nuke dead stores. - PM.add(createDeadStoreEliminationPass()); +void PassManagerBuilder::populatePostIPOPM(PassManagerBase &PM) { + // In PostIPO phase, the choice for inlining is simple: either no inlining at + // all or just run the inliner which only inline tiny functions. This function + // has freedom to pick up which choice is more appropriate. + // + assert(Inliner == 0 && "Don't specify inliner"); + if (OptLevel == 0) + return; - // Cleanup and simplify the code after the scalar optimizations. - PM.add(createInstructionCombiningPass()); + bool EnableLightWeightIPO = (OptLevel > 1); - PM.add(createJumpThreadingPass()); + // Add LibraryInfo if we have some. + if (LibraryInfo) PM.add(new TargetLibraryInfo(*LibraryInfo)); - // Delete basic blocks, which optimization passes may have killed. - PM.add(createCFGSimplificationPass()); + addInitialAliasAnalysisPasses(PM); - // Now that we have optimized the program, discard unreachable functions. - PM.add(createGlobalDCEPass()); + // Start of CallGraph SCC passes. + { + if (EnableLightWeightIPO) { + PM.add(createTinyFuncInliningPass()); + PM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs + } + + // Start of function pass. + { + PM.add(createMemCpyOptPass()); // Remove memcpy / form memset + if (UseNewSROA) + PM.add(createSROAPass(/*RequiresDomTree*/ false)); + else + PM.add(createScalarReplAggregatesPass(-1, false)); + PM.add(createEarlyCSEPass()); // Catch trivial redundancies + PM.add(createSCCPPass()); // Constant prop with SCCP + PM.add(createJumpThreadingPass()); // Thread jumps + PM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals + PM.add(createCFGSimplificationPass()); // Merge & remove BBs + PM.add(createReassociatePass()); // Reassociate expressions + PM.add(createLoopRotatePass()); // Rotate Loop + PM.add(createLICMPass()); // Hoist loop invariants + PM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3)); + PM.add(createIndVarSimplifyPass()); // Canonicalize indvars + PM.add(createLoopIdiomPass()); // Recognize idioms like memset. + PM.add(createLoopDeletionPass()); // Delete dead loops + + if (/*LoopVectorize &&*/ OptLevel > 1 && SizeLevel < 2) + PM.add(createLoopVectorizePass()); + + if (!DisableUnrollLoops) + PM.add(createLoopUnrollPass()); // Unroll small loops + + addExtensionsToPM(EP_LoopOptimizerEnd, PM); + + if (OptLevel > 1) + PM.add(createGVNPass()); // Remove redundancies + + PM.add(createInstructionCombiningPass()); + PM.add(createDeadStoreEliminationPass()); // Delete dead stores + PM.add(createAggressiveDCEPass()); // Delete dead instructions + if (UseNewSROA) + PM.add(createSROAPass(/*RequiresDomTree*/ false)); + else + PM.add(createScalarReplAggregatesPass(-1, false)); + + addExtensionsToPM(EP_ScalarOptimizerLate, PM); + + + // Add the various vectorization passes and relevant cleanup passes for + // them since we are no longer in the middle of the main scalar pipeline. + if (/*LoopVectorize && */OptLevel > 1 && SizeLevel < 2) + PM.add(createLoopVectorizePass()); + + #if 1 + if (!DisableUnrollLoops) + PM.add(createLoopUnrollPass()); // Unroll small loops + #endif + + PM.add(createInstructionCombiningPass()); + + if (SLPVectorize) + PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains. + } + } + + if (EnableLightWeightIPO) { + PM.add(createGlobalDCEPass()); // Remove dead fns and globals. + PM.add(createConstantMergePass()); // Merge dup global constants + } + addExtensionsToPM(EP_OptimizerLast, PM); } inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) { @@ -458,5 +583,6 @@ LLVMBool RunInliner) { PassManagerBuilder *Builder = unwrap(PMB); PassManagerBase *LPM = unwrap(PM); - Builder->populateLTOPassManager(*LPM, Internalize != 0, RunInliner != 0); + Builder->populateIPOPassManager(*LPM, Internalize != 0, RunInliner != 0); + Builder->populatePostIPOPM(*LPM); } Index: lib/Transforms/IPO/InlineSimple.cpp =================================================================== --- lib/Transforms/IPO/InlineSimple.cpp (revision 187135) +++ lib/Transforms/IPO/InlineSimple.cpp (working copy) @@ -72,6 +72,10 @@ return new SimpleInliner(Threshold); } +Pass *llvm::createTinyFuncInliningPass() { + return new SimpleInliner(40); +} + bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) { ICA = &getAnalysis(); return Inliner::runOnSCC(SCC); -------------- next part -------------- Index: include/clang/Frontend/CodeGenOptions.def =================================================================== --- include/clang/Frontend/CodeGenOptions.def (revision 187135) +++ include/clang/Frontend/CodeGenOptions.def (working copy) @@ -112,6 +112,7 @@ CODEGENOPT(VectorizeBB , 1, 0) ///< Run basic block vectorizer. CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer. CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer. +CODEGENOPT(IsPreIPO , 1, 0) ///< Indicate in pre-IPO phase /// Attempt to use register sized accesses to bit-fields in structures, when /// possible. Index: include/clang/Driver/CC1Options.td =================================================================== --- include/clang/Driver/CC1Options.td (revision 187135) +++ include/clang/Driver/CC1Options.td (working copy) @@ -210,6 +210,8 @@ HelpText<"Run the SLP vectorization passes">; def vectorize_slp_aggressive : Flag<["-"], "vectorize-slp-aggressive">, HelpText<"Run the BB vectorization passes">; +def preipo : Flag<["-"], "preipo">, + HelpText<"Run the pre-IPO passes">; //===----------------------------------------------------------------------===// // Dependency Output Options Index: lib/Frontend/CompilerInvocation.cpp =================================================================== --- lib/Frontend/CompilerInvocation.cpp (revision 187135) +++ lib/Frontend/CompilerInvocation.cpp (working copy) @@ -402,6 +402,7 @@ Opts.VectorizeBB = Args.hasArg(OPT_vectorize_slp_aggressive); Opts.VectorizeLoop = Args.hasArg(OPT_vectorize_loops); Opts.VectorizeSLP = Args.hasArg(OPT_vectorize_slp); + Opts.IsPreIPO = Args.hasArg(OPT_preipo); Opts.MainFileName = Args.getLastArgValue(OPT_main_file_name); Opts.VerifyModule = !Args.hasArg(OPT_disable_llvm_verifier); Index: lib/Driver/Tools.cpp =================================================================== --- lib/Driver/Tools.cpp (revision 187135) +++ lib/Driver/Tools.cpp (working copy) @@ -2014,7 +2014,8 @@ CmdArgs.push_back("-emit-pth"); } else { assert(isa(JA) && "Invalid action for clang tool."); - + if (D.IsUsingLTO(Args)) + CmdArgs.push_back("-preipo"); if (JA.getType() == types::TY_Nothing) { CmdArgs.push_back("-fsyntax-only"); } else if (JA.getType() == types::TY_LLVM_IR || Index: lib/CodeGen/BackendUtil.cpp =================================================================== --- lib/CodeGen/BackendUtil.cpp (revision 187135) +++ lib/CodeGen/BackendUtil.cpp (working copy) @@ -274,6 +274,10 @@ switch (Inlining) { case CodeGenOptions::NoInlining: break; case CodeGenOptions::NormalInlining: { + if (CodeGenOpts.IsPreIPO) { + PMBuilder.Inliner = createTinyFuncInliningPass(); + break; + } // FIXME: Derive these constants in a principled fashion. unsigned Threshold = 225; if (CodeGenOpts.OptimizeSize == 1) // -Os @@ -321,7 +325,10 @@ MPM->add(createStripSymbolsPass(true)); } - PMBuilder.populateModulePassManager(*MPM); + if (!CodeGenOpts.IsPreIPO) + PMBuilder.populateModulePassManager(*MPM); + else + PMBuilder.populatePreIPOPassMgr(*MPM); } TargetMachine *EmitAssemblyHelper::CreateTargetMachine(bool MustCreateTM) { From hfinkel at anl.gov Sat Jul 27 20:32:04 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Sat, 27 Jul 2013 22:32:04 -0500 (CDT) Subject: [LLVMdev] arch-specific predefines in LLVM's source In-Reply-To: Message-ID: <852058727.15015726.1374982324462.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > Hi, > > > ----- Original Message ----- > >>> ----- Original Message ----- > >>>> Hi all, > >>>> My recent commit r187027 fixed a simple oversight of > >>>> forgetting > >>>> to > >>>> check for __ppc__ (only checking __powerpc__), which broke my > >>>> powerpc-apple-darwin8 stage1 tests, since the system gcc only > >>>> provided > >>>> __ppc__. I was wondering if this justifies using simpler macros > >>>> like > >>>> > >>>> #define LLVM_PPC (defined(__ppc__) || defined(__powerpc__) ...) > >>>> #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__) > >>>> ...) > >> or define more clearly: > >> > >> #define LLVM_PPC32 (((defined(__ppc__) || defined(__powerpc__)) && > >> !LLVM_PPC64) > >> #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) > >> > >> or > >> > >> #define LLVM_PPC_ANY (defined(__ppc__) || defined(__powerpc__)) > >> #define LLVM_PPC32 (LLVM_PPC_ANY && !LLVM_PPC64) > > > > I thought that you had discovered that on Darwin __ppc__, etc. are > > defined only for 32-bit builds. Is that correct? > > > > -Hal > > Ah yes, with apple-gcc-4.0.1 (powerpc) > > -m32 only defines __ppc__ > -m64 only defines __ppc64__ > > So a more correct set of definitions would be: > > #define LLVM_PPC32 ((defined(__ppc__) || (defined(__powerpc__) && > !defined(__powerpc64__))) > #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__)) > #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) Yep, and the following seems safer still (and slightly cleaner): #define LLVM_PPC64 (defined(__ppc64__) || defined(__powerpc64__)) #define LLVM_PPC32 ((defined(__ppc__) || (defined(__powerpc__) && !LLVM_PPC64 #define LLVM_PPC_ANY (LLVM_PPC32 || LLVM_PPC64) As for where to put them, I'm still not quite sure. Support/Compiler.h does seem like the best current fit. If you propose a patch for review, I imagine someone will share an opinion ;) Thanks again, Hal > > Fang > > >>>> I've even seen __POWERPC__, _POWER, _ARCH_PPC being tested in > >>>> conditionals. > >>>> > >>>> These proposed standardized macros would only be used in LLVM > >>>> project > >>>> sources; there's no reason to exported them. > >>>> The standardized macros would simplify conditionals and make > >>>> their > >>>> use > >>>> less error-prone. > >>>> > >>>> What predefines do other architectures use? > >>> > >>> Would all uses of these macros be restricted to the PPC backend, > >>> or > >>> would they appear elsewhere as well? > >> > >> One place I see these predefines outside of the PPC backend is > >> lib/Support/Host.cpp > >> > >> Fang > >> > >>> -Hal > >>> > >>>> > >>>> What would be a suitable place for these proposed macros? > >>>> include/llvm/Support/Compiler.h? > >>>> include/llvm/Support/Arch.h (new)? > >>>> > >>>> Fang > >> > >> > >> -- > >> David Fang > >> http://www.csl.cornell.edu/~fang/ > >> > >> > > > > > > -- > David Fang > http://www.csl.cornell.edu/~fang/ > > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From nrotem at apple.com Sat Jul 27 23:54:29 2013 From: nrotem at apple.com (Nadav Rotem) Date: Sat, 27 Jul 2013 23:54:29 -0700 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 Message-ID: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> Hi, Below you can see the updated benchmark results for the new SLP-vectorizer. As you can see, there is a small number of compile time regressions, a single major runtime *regression, and many performance gains. There is a tiny increase in code size: 30k for the whole test-suite. Based on the numbers below I would like to enable the SLP-vectorizer by default for -O3. Please let me know if you have any concerns. Thanks, Nadav * - I now understand the Olden/BH regression better. BH is slower because of a store-buffer stall. This means that the store buffer fills up and the CPU has to wait for some stores to finish. I can think of two reasons that may cause this problem. First, our vectorized stores are followed by a memcpy that's expanded to a list of scalar-read/writes to the same addresses as the vector store. Maybe the processors can’t prune multiple stores to the same address with different sizes (Section 2.2.4 in the optimization guide has some info on this). Another possibility (less likely) is that we increase the critical path by adding a new pshufd instruction before the last vector store and that affects the store-buffer somehow. In any case, there is not much we can do at the IR-level to predict this. Performance Regressions - Compile Time Δ Previous Current σ MultiSource/Benchmarks/VersaBench/beamformer/beamformer 18.98% 0.0722 0.0859 0.0003 MultiSource/Benchmarks/FreeBench/pifft/pifft 5.66% 0.5003 0.5286 0.0015 MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt 4.85% 0.4084 0.4282 0.0014 MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt 4.36% 0.3856 0.4024 0.0018 MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt 2.62% 0.4424 0.4540 0.0019 External/SPEC/CINT2006/401_bzip2/401_bzip2 1.50% 1.0613 1.0772 0.0010 MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 1.23% 12.1337 12.2831 0.0296 MultiSource/Applications/kimwitu++/kc 1.15% 9.3690 9.4769 0.0186 SingleSource/Benchmarks/Misc-C++-EH/spirit 1.13% 3.2769 3.3139 0.0079 External/SPEC/CFP2000/188_ammp/188_ammp 1.01% 1.8632 1.8820 0.0059 Performance Regressions - Execution Time Δ Previous Current σ MultiSource/Benchmarks/Olden/bh/bh 19.24% 1.1551 1.3773 0.0021 SingleSource/Benchmarks/SmallPT/smallpt 3.75% 5.8779 6.0983 0.0146 SingleSource/Benchmarks/Misc-C++/Large/ray 1.08% 1.8194 1.8390 0.0009 Performance Improvements - Execution Time Δ Previous Current σ SingleSource/Benchmarks/Misc/matmul_f64_4x4 -53.67% 1.4064 0.6516 0.0007 External/Nurbs/nurbs -19.47% 2.5389 2.0445 0.0029 MultiSource/Benchmarks/Olden/power/power -18.49% 1.2572 1.0248 0.0004 SingleSource/Benchmarks/Misc/flops-4 -15.93% 0.7767 0.6530 0.0348 MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -14.72% 2.3925 2.0404 0.0013 SingleSource/Benchmarks/Misc/flops-6 -11.05% 1.1427 1.0164 0.0009 SingleSource/Benchmarks/Misc/flops-5 -10.43% 1.2771 1.1439 0.0015 MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -8.10% 2.3468 2.1568 0.0195 SingleSource/Benchmarks/Misc/pi -7.18% 0.6042 0.5608 0.0000 External/SPEC/CFP2006/444_namd/444_namd -4.01% 9.6053 9.2200 0.0064 SingleSource/Benchmarks/Linpack/linpack-pc -3.85% 95.5313 91.8522 1.1151 MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -3.52% 3.1962 3.0837 0.0063 MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl -2.93% 2.9336 2.8477 0.0037 MultiSource/Benchmarks/VersaBench/beamformer/beamformer -2.79% 0.8845 0.8598 0.0026 SingleSource/Benchmarks/Misc-C++/Large/sphereflake -2.79% 1.8517 1.8001 0.0014 External/SPEC/CFP2000/177_mesa/177_mesa -2.15% 1.7214 1.6844 0.0017 SingleSource/Benchmarks/CoyoteBench/fftbench -2.05% 0.7280 0.7131 0.0049 MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -1.96% 3.1494 3.0878 0.0034 SingleSource/Benchmarks/Misc/oourafft -1.70% 3.4625 3.4035 0.0009 SingleSource/Benchmarks/Misc/flops -1.31% 7.0775 6.9845 0.0014 MultiSource/Applications/JM/lencod/lencod -1.12% 4.5972 4.5455 0.0050 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Sun Jul 28 00:20:24 2013 From: chandlerc at google.com (Chandler Carruth) Date: Sun, 28 Jul 2013 00:20:24 -0700 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> Message-ID: Sorry for not posting sooner. I forgot to send an update with the results. I also have some benchmark data. It confirms much of what you posted -- binary size increase is essentially 0, performance increases across the board. It looks really good to me. However, there was one crash that I'd like to check if it still fires. Will update later today (feel free to ping me if you don't hear anything.). That said, why -O3? I think we should just enable this across the board, as it doesn't seem to cause any size regression under any mode, and the compile time hit is really low. On Sat, Jul 27, 2013 at 11:54 PM, Nadav Rotem wrote: > Hi, > > Below you can see the updated benchmark results for the new > SLP-vectorizer. As you can see, there is a small number of compile time > regressions, a single major runtime *regression, and many performance > gains. There is a tiny increase in code size: 30k for the whole test-suite. > Based on the numbers below I would like to enable the SLP-vectorizer by > default for -O3. Please let me know if you have any concerns. > > Thanks, > Nadav > > > * - I now understand the Olden/BH regression better. BH is slower because > of a store-buffer stall. This means that the store buffer fills up and the > CPU has to wait for some stores to finish. I can think of two reasons > that may cause this problem. First, our vectorized stores are followed by > a memcpy that's expanded to a list of scalar-read/writes to the same > addresses as the vector store. Maybe the processors can’t prune multiple > stores to the same address with different sizes (Section 2.2.4 in the > optimization guide has some info on this). Another possibility (less > likely) is that we increase the critical path by adding a new pshufd > instruction before the last vector store and that affects the store-buffer > somehow. In any case, there is not much we can do at the IR-level to > predict this. > > > > Performance Regressions - Compile TimeΔPreviousCurrentσ > MultiSource/Benchmarks/VersaBench/beamformer/beamformer18.98%0.07220.0859 > 0.0003MultiSource/Benchmarks/FreeBench/pifft/pifft5.66%0.50030.52860.0015 > MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt4.85% > 0.40840.42820.0014 > MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt > 4.36%0.38560.40240.0018 > MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt2.62%0.4424 > 0.45400.0019External/SPEC/CINT2006/401_bzip2/401_bzip21.50%1.06131.0772 > 0.0010MultiSource/Benchmarks/tramp3d-v4/tramp3d-v41.23%12.133712.2831 > 0.0296MultiSource/Applications/kimwitu++/kc1.15%9.36909.47690.0186 > SingleSource/Benchmarks/Misc-C++-EH/spirit1.13%3.27693.31390.0079 > External/SPEC/CFP2000/188_ammp/188_ammp1.01%1.86321.88200.0059 > > > Performance Regressions - Execution TimeΔPreviousCurrentσ > MultiSource/Benchmarks/Olden/bh/bh19.24%1.15511.37730.0021 > SingleSource/Benchmarks/SmallPT/smallpt3.75%5.87796.09830.0146 > SingleSource/Benchmarks/Misc-C++/Large/ray1.08%1.81941.83900.0009 > > > Performance Improvements - Execution TimeΔPreviousCurrentσ > SingleSource/Benchmarks/Misc/matmul_f64_4x4-53.67%1.40640.65160.0007 > External/Nurbs/nurbs-19.47%2.53892.04450.0029 > MultiSource/Benchmarks/Olden/power/power-18.49%1.25721.02480.0004 > SingleSource/Benchmarks/Misc/flops-4-15.93%0.77670.65300.0348 > MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt-14.72% > 2.39252.04040.0013SingleSource/Benchmarks/Misc/flops-6-11.05%1.14271.0164 > 0.0009SingleSource/Benchmarks/Misc/flops-5-10.43%1.27711.14390.0015 > MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt > -8.10%2.34682.15680.0195SingleSource/Benchmarks/Misc/pi-7.18%0.60420.5608 > 0.0000External/SPEC/CFP2006/444_namd/444_namd-4.01%9.60539.22000.0064 > SingleSource/Benchmarks/Linpack/linpack-pc-3.85%95.531391.85221.1151 > MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl-3.52% > 3.19623.08370.0063 > MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl > -2.93%2.93362.84770.0037 > MultiSource/Benchmarks/VersaBench/beamformer/beamformer-2.79%0.88450.8598 > 0.0026SingleSource/Benchmarks/Misc-C++/Large/sphereflake-2.79%1.85171.8001 > 0.0014External/SPEC/CFP2000/177_mesa/177_mesa-2.15%1.72141.68440.0017 > SingleSource/Benchmarks/CoyoteBench/fftbench-2.05%0.72800.71310.0049 > MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl-1.96% > 3.14943.08780.0034SingleSource/Benchmarks/Misc/oourafft-1.70%3.46253.4035 > 0.0009SingleSource/Benchmarks/Misc/flops-1.31%7.07756.98450.0014 > MultiSource/Applications/JM/lencod/lencod-1.12%4.59724.54550.0050 > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Sun Jul 28 01:22:37 2013 From: chandlerc at google.com (Chandler Carruth) Date: Sun, 28 Jul 2013 01:22:37 -0700 Subject: [LLVMdev] Questions about the semantics for lifetime intrinsics... Message-ID: So, in hacking on mem2reg I noticed that it doesn't actually implement optimizations enabled by lifetime markers. I thought I might take a stab at teaching it about them, but I'm left with some questions about the semantics. Much of this may have been hash out when they were added, and if so I'll appreciate your help educating me, and maybe we can come up with improved documentation to cover this. First, is there any realistic intent that these be used for heap pointers? If so, what is the expected use case? Second, if the answer to the first is 'no', then could we remove the 'start' intrinsic? It seems redundant as the value of an alloca prior to a store to that alloca is already 'undef'. Third, what is the semantic model intended for the size argument? The documentation says: """ The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. """ Part of this seems a bit confusingly specified -- what does it mean by "the object"? I assume what it really means is the object read from memory by a corresponding load, but that in and of itself is confusing because the pointer passed to this routine is typically not the pointer loaded. There might be many different loads of different type objects all corresponding to the same lifetime intrinsic. The best way I have of interpreting it is in terms of the 'end' intrinsic: the results of it is equivalent to that of a notional store of '[i8 x N] undef' to the provided pointer where N is the size. However, this becomes truly muddy in the presence of '-1' size which just means a "variable" size. But how much is variable? Where does the undef stop? I think the whole thing would be somewhat clearer as an intrinsic with an arbitrary pointer type and a boolean flag for 'is_variable_length'. If the flag is false (common), the pointee type's store size is the size of the region whose lifetime is marked. If the flag is true, the pointer must be an alloca instruction itself with a runtime size, and the entire alloca's lifetime is marked. Among other benefits, this would make mem2reg and other analyzes easier and faster by decreasing the bitcast or gep instructions that result from common frontend lowering patterns. It also matches more closely the behavior of load and store. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkotler at mips.com Sun Jul 28 14:36:31 2013 From: rkotler at mips.com (reed kotler) Date: Sun, 28 Jul 2013 14:36:31 -0700 Subject: [LLVMdev] IntrinsicLowering::AddPrototypes Message-ID: <51F58EDF.3030101@mips.com> It seems that several intrinsics are missing from this routine. In particular, floor, which was causing problems in the mips16 port. Is there some reason to not add the ones that are missing? For example, adding the following fixed my problem with floor. case Intrinsic::floor: EnsureFPIntrinsicsExist(M, I, "floorf", "floor", "floor"); break; From rkotler at mips.com Sun Jul 28 15:02:40 2013 From: rkotler at mips.com (Reed Kotler) Date: Sun, 28 Jul 2013 15:02:40 -0700 Subject: [LLVMdev] IntrinsicLowering::AddPrototypes In-Reply-To: <51F58EDF.3030101@mips.com> References: <51F58EDF.3030101@mips.com> Message-ID: <51F59500.3070106@mips.com> Ooops... Ignore this previous mail. The problem still exists with this change. On 07/28/2013 02:36 PM, reed kotler wrote: > It seems that several intrinsics are missing from this routine. > > In particular, floor, which was causing problems in the mips16 port. > > Is there some reason to not add the ones that are missing? > > For example, adding the following fixed my problem with floor. > > case Intrinsic::floor: > EnsureFPIntrinsicsExist(M, I, "floorf", "floor", "floor"); > break; From tobias at grosser.es Sun Jul 28 16:42:25 2013 From: tobias at grosser.es (Tobias Grosser) Date: Sun, 28 Jul 2013 16:42:25 -0700 Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> Message-ID: <51F5AC61.2080108@grosser.es> On 07/28/2013 06:52 AM, Star Tan wrote: > Hi Tobias, > > I tried to investigated the problem related to ScopInfo, but I need your > help on handling some problems about ISL and SCEV. I copied the list as the discussion may be helpful for others. @Sven, no need to read all. Just search for your name. [..] >>The interesting observation is, that Polly introduces three parameters >>(p_0, p_1, p_2) for this SCoP, even though in the C source code only the >>variable 'i' is SCoP invariant. However, due to the way >>SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant >>SCEVExpr(essions) which are all translated into independent parameters. >>However, as we can see, the only difference between the three >>parameters is a different constant in the base of the AddRecExpr. If we >>would just introduce p_0 (the parameter where the scev-base is zero) and >>express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve >>this problem very quickly. > > Currently, Polly introduces three parameters because it treats (base, step) as a whole for each parameter. You are right, it seems we could usea single parameter to represent all these accesses. > > I have attached a patch file for this purpose. This patch file is just a hack file to see whether three parameters can be merged into one parameter and whether the compile-time could be reduce after that. The key of this patch is the added source code inScop::addParams, which only add parameters if the new parameter has different "step" value. In other ways, we skip the "base" value when compare parameters. > With this patch file, the Scop becomes: > > Context: > p0: {0,+,128}<%for.cond2.preheader> > Statements { > Stmt_for_body6 > Domain := > { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; > Scattering := > { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; > ReadAccess := > [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; > ReadAccess := > [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; > MustWriteAccess := > [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; > MustWriteAccess := > [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; > MustWriteAccess := > [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 }; > } > > Unfortunately, the compile-time is still very high! This _looks_ already a lot better. Very nice! > I have investigated a little about it. Maybe this is becauseisl_union_map_add_mapin Polly-dependencewould still introduces new parameters based on the resulted produced by ScopInfo. > > Without this patch file, the Read, Write, MayWrite (input to ISL library)calculated by collectInfo are: > > Read:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) } > Write:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_2 + 16i0 and i0 >= 0 and i0 <= 7) } > Maywrite:[p_0, p_1, p_2] -> { } > Schedule:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] } > > But with our patch file, they become: > > Read: [p_0, p_0'] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) } > Write: [p_0, p_0', p_0''] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0'' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 4 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) } > MayWrite: { } > Schedule: { Stmt_for_body6[i0] -> scattering[0, i0, 0] } > It seems that theisl_union_map_add_map function called from Dependences::collectInfoautomatically introduces extra parameters, but I have no idea why this happens. > > Could you give me some further suggestion on this problem? Oh, you are on the right way. Even though we only see parameters 'p_0', isl seems to believe those values 'p_0' are not identical. We see this as isl renames them such that the difference becomes obvious. Unfortunately, as long as isl does not believe those parameters are identical, it will still be stuck in long calculations. To understand the above behaviour we need to understand how isl determines if two parameter dimensions refer to the same parameter. What you would probably expect is that the union of the two sets [p] -> {[i] = p} and [p] -> {[i] = p} is a single set [p] -> {[i] = p} as the two parameters have the very same name. However, this is not always true. Even though parameters share the same name, they may in fact be different isl_id objects. isl_id_alloc() takes a parameter name and a pointer and only if both of them are identical, the very same isl_id object is returned. Sven: In terms of making the behaviour of isl easier to understand, it may make sense to fail/assert in case operands have parameters that are named identical, but that refer to different pointer values. So the question is why do you obtain such different isl_ids and what can be done to fix this. I don't have a solution ready, but I have some questions in the patch that may help you when looking into this. > diff --git a/include/polly/ScopInfo.h b/include/polly/ScopInfo.h > index 8c56582..bab7763 100755 > --- a/include/polly/ScopInfo.h > +++ b/include/polly/ScopInfo.h > @@ -445,6 +445,7 @@ class Scop { > /// The isl_ids that are used to represent the parameters > typedef std::map ParamIdType; > ParamIdType ParameterIds; > + ParamIdType ParameterIdsSpace; Why are you introducing another member variable here? Why don't you keep using ParameterIds? What is the this variable supposed to track? > diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp > index 5a185d0..9f918f3 100644 > --- a/lib/Analysis/Dependences.cpp > +++ b/lib/Analysis/Dependences.cpp > @@ -30,6 +30,9 @@ > #include > #include > > +#define DEBUG_TYPE "polly-dependence" > +#include "llvm/Support/Debug.h" > + > using namespace polly; > using namespace llvm; > > @@ -88,8 +91,15 @@ void Dependences::collectInfo(Scop &S, isl_union_map **Read, > void Dependences::calculateDependences(Scop &S) { > isl_union_map *Read, *Write, *MayWrite, *Schedule; > > + DEBUG(dbgs() << "Scop: " << S << "\n"); > + > collectInfo(S, &Read, &Write, &MayWrite, &Schedule); > > + DEBUG(dbgs() << "Read: " << Read << "\n"; > + dbgs() << "Write: " << Write << "\n"; > + dbgs() << "MayWrite: " << MayWrite << "\n"; > + dbgs() << "Schedule: " << Schedule << "\n"); > + > if (OptAnalysisType == VALUE_BASED_ANALYSIS) { > isl_union_map_compute_flow( > isl_union_map_copy(Read), isl_union_map_copy(Write), > @@ -131,6 +141,8 @@ void Dependences::calculateDependences(Scop &S) { > RAW = isl_union_map_coalesce(RAW); > WAW = isl_union_map_coalesce(WAW); > WAR = isl_union_map_coalesce(WAR); > + > + DEBUG(printScop(dbgs())); > } This patch looks good/helpful by itself. I propose to submit it separately for commit. > bool Dependences::runOnScop(Scop &S) { > diff --git a/lib/Analysis/ScopInfo.cpp b/lib/Analysis/ScopInfo.cpp > index aa72f3e..6fc838e 100644 > --- a/lib/Analysis/ScopInfo.cpp > +++ b/lib/Analysis/ScopInfo.cpp > @@ -86,6 +86,14 @@ public: > isl_aff *Affine = > isl_aff_zero_on_domain(isl_local_space_from_space(Space)); > Affine = isl_aff_add_coefficient_si(Affine, isl_dim_param, 0, 1); > + if (Scev->getSCEVType() == scAddRecExpr) { > + const SCEVAddRecExpr *AR = cast(Scev); The canonical pattern for this is: if (SCEVAddRecExpr *AR = dyn_cast(Scev)) { > + const SCEVConstant *c = cast(AR->getOperand(0)); This is obviously a hack. The base is not always a constant. You can probably just call use something like, isl_pw_aff *BaseValue = visit(AR->getOperand(0)) Affine = isl_pw_aff_sum(Affine, BaseValue); > + isl_int t; > + isl_int_init(t); > + isl_int_set_si(t, c->getValue()->getSExtValue()); We now use isl_val instead of isl_int. > return isl_pw_aff_alloc(Domain, Affine); > } > @@ -717,6 +725,34 @@ void Scop::addParams(std::vector NewParameters) { > if (ParameterIds.find(Parameter) != ParameterIds.end()) > continue; > > + if (Parameter->getSCEVType() == scAddRecExpr) { > + int index, maxIndex = Parameters.size(); > + const SCEVAddRecExpr *AR = cast(Parameter); > + > + for (index = 0; index < Parameters.size(); ++index) { > + const SCEV *pOld = Parameters[index]; > + > + if (pOld->getSCEVType() != scAddRecExpr) > + continue; > + > + const SCEVAddRecExpr *OAR = cast(pOld); > + if (AR->getNumOperands() != OAR->getNumOperands()) > + continue; > + > + unsigned i, e; > + for (i = 1, e = AR->getNumOperands(); i != e; ++i){ > + if (AR->getOperand(i) != OAR->getOperand(i)) > + break; > + } > + > + if (i == e) { > + // Parameters.push_back(Parameter); > + ParameterIds[Parameter] = index; > + return; > + } > + } > + } I think this is the right idea, but probably the wrong place to put it. I would put this into SCEVValidator::visitAddRecExpr. This function always adds the AddRecExpr itself as a parameter, whenever it is found to be parametric. However, what we should do is to create a new ScevExpr that starts at zero and is otherwise identical. We then add this as a parameter. When doing this, we now also need to keep all the parameters that have been found previously in the base expression. > int dimension = Parameters.size(); > > Parameters.push_back(Parameter); > @@ -777,9 +813,9 @@ void Scop::addParameterBounds() { > > void Scop::realignParams() { > // Add all parameters into a common model. > - isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIds.size()); > + isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIdsSpace.size()); > > - for (ParamIdType::iterator PI = ParameterIds.begin(), PE = ParameterIds.end(); > + for (ParamIdType::iterator PI = ParameterIdsSpace.begin(), PE = ParameterIdsSpace.end(); I do not see why those changes are necessary. > PI != PE; ++PI) { > const SCEV *Parameter = PI->first; > isl_id *id = getIdForParam(Parameter); > @@ -787,7 +823,7 @@ void Scop::realignParams() { > } > > // Align the parameters of all data structures to the model. > - Context = isl_set_align_params(Context, Space); > +// Context = isl_set_align_params(Context, Space); Also no idea why this one is necessary. Hope this helps, Tobias From mohammad.r.haghighat at intel.com Sun Jul 28 17:13:41 2013 From: mohammad.r.haghighat at intel.com (Haghighat, Mohammad R) Date: Mon, 29 Jul 2013 00:13:41 +0000 Subject: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk In-Reply-To: <94037EC670985B4A85538143FF8D81CD7263734A@ORSMSX107.amr.corp.intel.com> References: <668642398.8242115.1372268637416.JavaMail.root@alcf.anl.gov> <82364744.14764035.1374852805819.JavaMail.root@alcf.anl.gov> <94037EC670985B4A85538143FF8D81CD7263734A@ORSMSX107.amr.corp.intel.com> Message-ID: <94037EC670985B4A85538143FF8D81CD72637766@ORSMSX107.amr.corp.intel.com> Hal, Just posted a package containing 214 small tests showing bugs in the latest Clang (3.4 trunk 187225) on MacOS X when compiled at -O2. http://llvm.org/bugs/show_bug.cgi?id=16431 These are new tests different from the previously posted ones, but their root causes could be the same as before or could actually be new bugs. Cheers, -moh -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Haghighat, Mohammad R Sent: Friday, July 26, 2013 8:19 PM To: Hal Finkel Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk Hal, I ran the failing tests in the attachment to the bug 16431 on the latest clang trunk (version 3.4 trunk 187225). http://llvm.org/bugs/show_bug.cgi?id=16431 The following tests still fail: Tests in diff: t10236 t12206 t2581 t6734 t7788 t7820 t8069 t9982 All tests in InfLoopInClang: t19193 t22300 t25903 t27872 t33143 t8543 Meanwhile, I'll launch a new run of icFuzz and will post the results later. -moh -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Friday, July 26, 2013 8:33 AM To: Haghighat, Mohammad R Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk ----- Original Message ----- > ----- Original Message ----- > > Great job, Hal! > > > > Sure. I'd be happy to run icFuzz and report the fails once these > > bugs are fixed and thereafter whenever people want new runs. > > Obviously, this can be automated, but the problem is that icFuzz is > > not currently open sourced. > > I would be happy to see this open sourced, but I think that we can > work something out regardless. > > Also, once we get the current set of things resolved, I think it would > be useful to test running with: > > -- -O3, LTO (-O4 or -flto), > -- -fslp-vectorize, -fslp-vectorize-aggressive (which are actually > separate optimizations) > -- -ffast-math (if you can do floating point with tolerances, or at > least -ffinite-math-only), -fno-math-errno (and there are obviously a > whole bunch of non-default code-generation and target options). > > Is it feasible to set up runs with different flags? > > > Once there's a bug in the compiler, there's really no limit in the > > number of failing tests that can be generated, so it's more > > productive to run the generator after the previously reported bugs > > are fixed. > > Agreed. > > > > > We've also seen cases where the results of "clang -O2" are different > > on Mac vs. Linux/Windows. > > I recall an issue related to default settings for FP, and differences > with libm implementation. Are there non-floating-point cases? > > > > > Just let me know when you want a new run. > > Will do! Mohammad, Can you please re-run these now? I know that the original loop-vectorizer bugs causing the miscompiles have been fixed, and the others also seem to have been resolved as well. Thanks again, Hal > > -Hal > > > > > Cheers, > > -moh > > > > -----Original Message----- > > From: Hal Finkel [mailto:hfinkel at anl.gov] > > Sent: Wednesday, June 26, 2013 7:35 AM > > To: Haghighat, Mohammad R > > Cc: llvmdev at cs.uiuc.edu; Jim Grosbach > > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly > > generated tests that fail on clang 3.4 trunk > > > > ----- Original Message ----- > > > > > > Hi Moh, > > > > > > > > > Thanks for this. I’m really glad to see the work you’re doing in > > > this area and believe it will be extremely helpful in improving > > > the quality of the compiler. > > > > > > > > > -Jim > > > > > > > > > > > > On Jun 24, 2013, at 4:10 PM, Haghighat, Mohammad R < > > > mohammad.r.haghighat at intel.com > wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > I just submitted a bug report with a package containing 107 small > > > test cases that fail on the latest LLVM/clang 3.4 main trunk > > > (184563). Included are test sources, compilation commands, test > > > input files, and results at –O0 and –O2 when applicable. > > > > > > http://llvm.org/bugs/show_bug.cgi?id=16431 > > > > > > These tests have been automatically generated by an internal tool > > > at Intel, the Intel Compiler fuzzer, icFuzz. The tests are > > > typically very small. For example, for the following simple loop > > > (test > > > t5702) > > > on MacOS X, clang at –O2 generates a binary that crashes: > > > > > > // Test Loop Interchange > > > for (j = 2; j < 76; j++) { > > > for (jm = 1; jm < 30; jm++) { > > > h[j-1][jm-1] = j + 83; > > > } > > > } > > > > > > The tests are put in to two categories > > > - tests that have different runtime outputs when compiled at -O0 > > > and > > > -O2 (this category also includes runtime crashes) > > > - tests that cause infinite loops in the Clang optimizer > > > > > > Many of these failing tests could be due to the same bug, thus a > > > much smaller number of root problems are expected. > > > > > > Any help with triaging these bugs would be highly appreciated. > > > > I've gone through all of the miscompile cases, used bugpoint to > > reduce them, and opened individual PRs for several distinct bugs. > > So > > far we have: PR16455 (loop vectorizer), PR16457 (sccp), PR16460 > > (instcombine). Thanks again for doing this! Do you plan on repeating > > this testing on a regular basis? Can it be automated? > > > > -Hal > > > > > > > > Thanks, > > > -moh > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eda-qa at disemia.com Sun Jul 28 20:49:04 2013 From: eda-qa at disemia.com (edA-qa mort-ora-y) Date: Mon, 29 Jul 2013 05:49:04 +0200 Subject: [LLVMdev] PointerType without body, post-construction set type? Message-ID: <51F5E630.4000501@disemia.com> With StructType you can create an empty structure and then call setBody later. How can one do the same thing with a PointerType? I'm translating a recursive structure which includes pointers to itself. I end up creating multiple copies of logically equivalent pointer types. -- edA-qa mort-ora-y -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Sign: Please digitally sign your emails. Encrypt: I'm also happy to receive encrypted mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From hfinkel at anl.gov Sun Jul 28 21:01:14 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Sun, 28 Jul 2013 23:01:14 -0500 (CDT) Subject: [LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk In-Reply-To: <94037EC670985B4A85538143FF8D81CD72637766@ORSMSX107.amr.corp.intel.com> Message-ID: <755697242.15071025.1375070474631.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > Hal, > > Just posted a package containing 214 small tests showing bugs in the > latest Clang (3.4 trunk 187225) on MacOS X when compiled at -O2. > http://llvm.org/bugs/show_bug.cgi?id=16431 > > These are new tests different from the previously posted ones, but > their root causes could be the same as before or could actually be > new bugs. Great, thanks! I'll go through them. -Hal > > Cheers, > -moh > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Haghighat, > Mohammad R > Sent: Friday, July 26, 2013 8:19 PM > To: Hal Finkel > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly > generated tests that fail on clang 3.4 trunk > > Hal, > > I ran the failing tests in the attachment to the bug 16431 on the > latest clang trunk (version 3.4 trunk 187225). > http://llvm.org/bugs/show_bug.cgi?id=16431 > > The following tests still fail: > Tests in diff: t10236 t12206 t2581 t6734 t7788 > t7820 t8069 t9982 > All tests in InfLoopInClang: t19193 t22300 t25903 t27872 t33143 > t8543 > > Meanwhile, I'll launch a new run of icFuzz and will post the results > later. > > -moh > > -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: Friday, July 26, 2013 8:33 AM > To: Haghighat, Mohammad R > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing randomly > generated tests that fail on clang 3.4 trunk > > ----- Original Message ----- > > ----- Original Message ----- > > > Great job, Hal! > > > > > > Sure. I'd be happy to run icFuzz and report the fails once these > > > bugs are fixed and thereafter whenever people want new runs. > > > Obviously, this can be automated, but the problem is that icFuzz > > > is > > > not currently open sourced. > > > > I would be happy to see this open sourced, but I think that we can > > work something out regardless. > > > > Also, once we get the current set of things resolved, I think it > > would > > be useful to test running with: > > > > -- -O3, LTO (-O4 or -flto), > > -- -fslp-vectorize, -fslp-vectorize-aggressive (which are actually > > separate optimizations) > > -- -ffast-math (if you can do floating point with tolerances, or > > at > > least -ffinite-math-only), -fno-math-errno (and there are > > obviously a > > whole bunch of non-default code-generation and target options). > > > > Is it feasible to set up runs with different flags? > > > > > Once there's a bug in the compiler, there's really no limit in > > > the > > > number of failing tests that can be generated, so it's more > > > productive to run the generator after the previously reported > > > bugs > > > are fixed. > > > > Agreed. > > > > > > > > We've also seen cases where the results of "clang -O2" are > > > different > > > on Mac vs. Linux/Windows. > > > > I recall an issue related to default settings for FP, and > > differences > > with libm implementation. Are there non-floating-point cases? > > > > > > > > Just let me know when you want a new run. > > > > Will do! > > Mohammad, > > Can you please re-run these now? I know that the original > loop-vectorizer bugs causing the miscompiles have been fixed, and > the others also seem to have been resolved as well. > > Thanks again, > Hal > > > > > -Hal > > > > > > > > Cheers, > > > -moh > > > > > > -----Original Message----- > > > From: Hal Finkel [mailto:hfinkel at anl.gov] > > > Sent: Wednesday, June 26, 2013 7:35 AM > > > To: Haghighat, Mohammad R > > > Cc: llvmdev at cs.uiuc.edu; Jim Grosbach > > > Subject: Re: [LLVMdev] [icFuzz] Help needed with analyzing > > > randomly > > > generated tests that fail on clang 3.4 trunk > > > > > > ----- Original Message ----- > > > > > > > > Hi Moh, > > > > > > > > > > > > Thanks for this. I’m really glad to see the work you’re doing > > > > in > > > > this area and believe it will be extremely helpful in improving > > > > the quality of the compiler. > > > > > > > > > > > > -Jim > > > > > > > > > > > > > > > > On Jun 24, 2013, at 4:10 PM, Haghighat, Mohammad R < > > > > mohammad.r.haghighat at intel.com > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > I just submitted a bug report with a package containing 107 > > > > small > > > > test cases that fail on the latest LLVM/clang 3.4 main trunk > > > > (184563). Included are test sources, compilation commands, test > > > > input files, and results at –O0 and –O2 when applicable. > > > > > > > > http://llvm.org/bugs/show_bug.cgi?id=16431 > > > > > > > > These tests have been automatically generated by an internal > > > > tool > > > > at Intel, the Intel Compiler fuzzer, icFuzz. The tests are > > > > typically very small. For example, for the following simple > > > > loop > > > > (test > > > > t5702) > > > > on MacOS X, clang at –O2 generates a binary that crashes: > > > > > > > > // Test Loop Interchange > > > > for (j = 2; j < 76; j++) { > > > > for (jm = 1; jm < 30; jm++) { > > > > h[j-1][jm-1] = j + 83; > > > > } > > > > } > > > > > > > > The tests are put in to two categories > > > > - tests that have different runtime outputs when compiled at > > > > -O0 > > > > and > > > > -O2 (this category also includes runtime crashes) > > > > - tests that cause infinite loops in the Clang optimizer > > > > > > > > Many of these failing tests could be due to the same bug, thus > > > > a > > > > much smaller number of root problems are expected. > > > > > > > > Any help with triaging these bugs would be highly appreciated. > > > > > > I've gone through all of the miscompile cases, used bugpoint to > > > reduce them, and opened individual PRs for several distinct bugs. > > > So > > > far we have: PR16455 (loop vectorizer), PR16457 (sccp), PR16460 > > > (instcombine). Thanks again for doing this! Do you plan on > > > repeating > > > this testing on a regular basis? Can it be automated? > > > > > > -Hal > > > > > > > > > > > Thanks, > > > > -moh > > > > > > > > _______________________________________________ > > > > LLVM Developers mailing list > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > _______________________________________________ > > > > LLVM Developers mailing list > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > > > -- > > > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From vijay.daultani at gmail.com Sat Jul 27 09:02:46 2013 From: vijay.daultani at gmail.com (Vijay Daultani) Date: Sat, 27 Jul 2013 21:32:46 +0530 Subject: [LLVMdev] Require Grammar for converting C to IR Message-ID: Respected Sir/Madam, As I was developing some part of compiler for a project. I require grammar (BNF or EBNF) for converting the C code in the IR as it is not been mentioned any where over your official website. Awaiting for your help. Regards, Vijay Daultani. M.Tech student IIT Delhi From b17c0de at gmail.com Sun Jul 28 09:08:25 2013 From: b17c0de at gmail.com (Kal) Date: Sun, 28 Jul 2013 18:08:25 +0200 Subject: [LLVMdev] libcxx support library Message-ID: <51F541F9.70901@gmail.com> Hi, I would like to install clang 3.3 with libc++ on Linux where g++4.4 is currently the default/only toolchain. I'm not sure how to choose the best support library: libsupc++/libc++abi/libcxxrt. Concerning libsupc++, I noticed this: http://lists.cs.uiuc.edu/pipermail/llvmbugs/2012-August/024800.html Are there other issues with using libsupc++? Doesn't it even make sense to use libsupc++ 4.4 with libc++ when I need C++11 support? What would be the limitations using it C++11 if any? Which of these support libraries would be the most mature? I care most about stability and standards-conformance. Interoperability with g++ is not important. Thanks! -Kal From nicholas at mxc.ca Mon Jul 29 01:27:22 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Mon, 29 Jul 2013 01:27:22 -0700 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: References: Message-ID: <51F6276A.3020008@mxc.ca> Vijay Daultani wrote: > Respected Sir/Madam, > > As I was developing some part of compiler for a project. I require grammar > (BNF or EBNF) for converting the C code in the IR as it is not been > mentioned any where over your official website. I don't know what you mean by "converting the C code in the IR". Regardless, LLVM does not have BNF or EBNF forms (or any formalized grammar) for C, nor for LLVM IR. Sorry! Nick From David.Chisnall at cl.cam.ac.uk Mon Jul 29 01:28:28 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Mon, 29 Jul 2013 09:28:28 +0100 Subject: [LLVMdev] PointerType without body, post-construction set type? In-Reply-To: <51F5E630.4000501@disemia.com> References: <51F5E630.4000501@disemia.com> Message-ID: You don't need to. First you create the struct type. Then you create a pointertype to the struct type. Then you create the body for the struct type, containing the pointer type (potentially multiple times). David On 29 Jul 2013, at 04:49, edA-qa mort-ora-y wrote: > With StructType you can create an empty structure and then call setBody > later. How can one do the same thing with a PointerType? > > I'm translating a recursive structure which includes pointers to itself. > I end up creating multiple copies of logically equivalent pointer types. > > -- > edA-qa mort-ora-y > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > Sign: Please digitally sign your emails. > Encrypt: I'm also happy to receive encrypted mail. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nicholas at mxc.ca Mon Jul 29 01:45:20 2013 From: nicholas at mxc.ca (Nick Lewycky) Date: Mon, 29 Jul 2013 01:45:20 -0700 Subject: [LLVMdev] Questions about the semantics for lifetime intrinsics... In-Reply-To: References: Message-ID: <51F62BA0.9000008@mxc.ca> Chandler Carruth wrote: > So, in hacking on mem2reg I noticed that it doesn't actually implement > optimizations enabled by lifetime markers. I thought I might take a stab > at teaching it about them, but I'm left with some questions about the > semantics. Much of this may have been hash out when they were added, and > if so I'll appreciate your help educating me, and maybe we can come up > with improved documentation to cover this. > > First, is there any realistic intent that these be used for heap > pointers? If so, what is the expected use case? As you noticed by the lack of implementation in mem2reg, they're currently only implemented for heap pointers. The use case is for lowering a stack-based language to LLVM IR. The language's allocate-space-from-the-"stack" (really in heap) function would use lifetime.start to indicate that the stack slot contains uninitialized memory, and the pop function would use lifetime.end to indicate that the memory is dead for DSE purposes. > Second, if the answer to the first is 'no', then could we remove the > 'start' intrinsic? It seems redundant as the value of an alloca prior to > a store to that alloca is already 'undef'. Lifetime.start and lifetime.end are *almost* the same. There's a long thread on the subject back here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/057121.html which is particularly interesting for the conversation about replacing To quote my past self, "You can almost entirely model lifetime.start and lifetime.end as being a store of undef to the address. However, they're the tiniest bit stronger. With a store of undef, you can delete stores that precede (with no intervening load) and loads that follow (with no intervening store). On top of that, a start lets you delete loads that precede, and an end lets you delete stores that follow." > Third, what is the semantic model intended for the size argument? The > documentation says: > > """ > The first argument is a constant integer representing the size of the > object, or -1 if it is variable sized. > """ > > Part of this seems a bit confusingly specified -- what does it mean by > "the object"? I assume what it really means is the object read from > memory by a corresponding load, but that in and of itself is confusing > because the pointer passed to this routine is typically not the pointer > loaded. There might be many different loads of different type objects > all corresponding to the same lifetime intrinsic. I agree, speaking of the "object" is out of place here. It's just the length in bytes of lifetime starts or ends. Put another way, it's equivalent to a series of consecutive one-byte starts/ends. > The best way I have of interpreting it is in terms of the 'end' > intrinsic: the results of it is equivalent to that of a notional store > of '[i8 x N] undef' to the provided pointer where N is the size. > However, this becomes truly muddy in the presence of '-1' size which > just means a "variable" size. But how much is variable? Where does the > undef stop? > > I think the whole thing would be somewhat clearer as an intrinsic with > an arbitrary pointer type and a boolean flag for 'is_variable_length'. > If the flag is false (common), the pointee type's store size is the size > of the region whose lifetime is marked. If the flag is true, the pointer > must be an alloca instruction itself with a runtime size, and the entire > alloca's lifetime is marked. Among other benefits, this would make > mem2reg and other analyzes easier and faster by decreasing the bitcast > or gep instructions that result from common frontend lowering patterns. > It also matches more closely the behavior of load and store. I think the -1 case is intended to mean "the whole thing" for users who don't want to lower to a specific number of bytes. Nick From vsp1729 at gmail.com Mon Jul 29 02:55:23 2013 From: vsp1729 at gmail.com (Vikram Singh) Date: Mon, 29 Jul 2013 02:55:23 -0700 (PDT) Subject: [LLVMdev] Destination of callee saved register Message-ID: <1375091723039-59901.post@n5.nabble.com> Hi In sparc ABI the arguments are saved by the callee in the caller stack frame. Q. What to do to save them in callee stack frame itself. for example by default this is generated sti r2, -2(fp) sti r3, -3(fp) Instead how to generate sti r2, 4(fp) sti r3, 5(fp) The Indices are just for explanation. The matter is In sparc they are -ive which means in the caller stack frame. But I want to save them in the callee stack frame. PS : If i try to add r2, r3, ..etc to CalleSavedRegister() then llvm saves the arguments in both callee and caller stack frame! Help me Vikram -- View this message in context: http://llvm.1065342.n5.nabble.com/Destination-of-callee-saved-register-tp59901.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From vsp1729 at gmail.com Mon Jul 29 04:44:30 2013 From: vsp1729 at gmail.com (Vikram Singh) Date: Mon, 29 Jul 2013 04:44:30 -0700 (PDT) Subject: [LLVMdev] Libcall for double precision comparison. Message-ID: <1375098270958-59902.post@n5.nabble.com> Hi folks How to place libcall for double precision number comparison. My machine does not have double precision comparison code but llvm is using single precision condition code deliberately. How to solve the problem. Regards VSP -- View this message in context: http://llvm.1065342.n5.nabble.com/Libcall-for-double-precision-comparison-tp59902.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From brianherman at gmail.com Mon Jul 29 05:23:54 2013 From: brianherman at gmail.com (Brian Herman) Date: Mon, 29 Jul 2013 07:23:54 -0500 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: <51F6276A.3020008@mxc.ca> References: <51F6276A.3020008@mxc.ca> Message-ID: I am curious how do you guys do it anyways? On Mon, Jul 29, 2013 at 3:27 AM, Nick Lewycky wrote: > Vijay Daultani wrote: > >> Respected Sir/Madam, >> >> As I was developing some part of compiler for a project. I require grammar >> (BNF or EBNF) for converting the C code in the IR as it is not been >> mentioned any where over your official website. >> > > I don't know what you mean by "converting the C code in the IR". > > Regardless, LLVM does not have BNF or EBNF forms (or any formalized > grammar) for C, nor for LLVM IR. Sorry! > > Nick > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Mon Jul 29 06:00:30 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 29 Jul 2013 15:00:30 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: Message-ID: <51F6676E.2090800@free.fr> Hi Reid, On 25/07/13 23:38, Reid Kleckner wrote: > Hi LLVM folks, > > To properly implement pass-by-value in the Microsoft C++ ABI, we need to be able > to take the address of an outgoing call argument slot. This is > http://llvm.org/PR5064 . > > Problem > ------- > > On Windows, C structs are pushed right onto the stack in line with the other > arguments. In LLVM, we use byval to model this, and it works for C structs. > However, C++ records are also passed this way, and reusing byval for C++ records > breaks C++ object identity rules. > > In order to implement the ABI properly, we need a way to get the address of the > argument slot *before* we start the call, so that we can either construct the > object in place on the stack or at least call its copy constructor. what does GCC do? Ciao, Duncan. > > This is further complicated by the possibility of nested calls passing arguments by > value. A good general case to think about is a binary tree of calls that take > two arguments by value and return by value: > > struct A { int a; }; > A foo(A, A); > foo(foo(A(), A()), foo(A(), A())); > > To complete the outer call to foo, we have to adjust the stack for its outgoing > arguments before the inner calls to foo, and arrange for the sret pointers to > point to those slots. > > To make this even more complicated, C++ methods are typically callee cleanup > (thiscall), but free functions are caller cleanup (cdecl). > > Features > -------- > > A few weeks ago, I sat down with some folks at Google and we came up with this > proposal, which tries to add the minimum set of LLVM IL features to make this > possible. > > 1. Allow alloca instructions to use llvm.stacksave values to indicate scoping. > > This creates an SSA dependence between the alloca instruction and the > stackrestore instruction that prevents optimizers from accidentally reordering > them in ways that don't verify. llvm.stacksave in this case is taking on a role > similar to CALLSEQ_START in the selection dag. > > LLVM can also apply this to dynamic allocas from inline functions to ensure that > optimizers don't move them. > > 2. Add an 'alloca' attribute for parameters. > > Only an alloca value can be passed to a parameter with this attribute. It > cannot be bitcasted or GEPed. An alloca can only be passed in this way once. > It can be passed as a normal pointer to any number of other functions. > > Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, there > can be no allocas between the creation of an alloca passed with this attribute > and its associated call. > > 3. Add a stackrestore field to call and invoke instructions. > > This models calling conventions which do their own cleanup, and ensures that > even after optimizations have perturbed the IR, we don't consider the allocas to > be live. For caller cleanup conventions, while the callee may have called > destructors on its arguments, the allocas can be considered live until the stack > restore. > > Example > ------- > > A single call to foo, assuming it is stdcall, would be lowered something like: > > %res = alloca %struct.A > %base = llvm.stacksave() > %arg1 = alloca %struct.A, stackbase %base > %arg2 = alloca %struct.A, stackbase %base > call @A_ctor(%arg1) > call @A_ctor(%arg2) > call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore %base > > If control does not flow through a call or invoke with a stackrestore field, > then manual calls to llvm.stackrestore must be emitted before another call or > invoke can use an 'alloca' argument. The manual stack restore call ends the > lifetime of the allocas. This is necessary to handle unwind edges from argument > expression evaluation as well as the case where foo is not callee cleanup. > > Implementation > -------------- > > By starting out with the stack save and restore intrinsics, we can hopefully > approach a slow but working implementation sooner rather than later. The work > should mostly be in the verifier, the IR, its parser, and the x86 backend. > > I don't plan to start working on this immediately, but over the long run this > will be really important to support well. > > --- > > That's all! Please send feedback! This is admittedly a really complicated > feature and I'm sorry for inflicting it on the LLVM community, but it's > obviously beyond my control. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From devlists at shadowlab.org Mon Jul 29 06:06:49 2013 From: devlists at shadowlab.org (Jean-Daniel Dupas) Date: Mon, 29 Jul 2013 15:06:49 +0200 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: References: <51F6276A.3020008@mxc.ca> Message-ID: <78173CA6-5449-4BAD-8956-7A8EC720314C@shadowlab.org> Do what ? compile C code ? You have to use clang which is the LLVM C language family frontend. See http://clang.llvm.org/ Le 29 juil. 2013 à 14:23, Brian Herman a écrit : > I am curious how do you guys do it anyways? > > > On Mon, Jul 29, 2013 at 3:27 AM, Nick Lewycky wrote: > Vijay Daultani wrote: > Respected Sir/Madam, > > As I was developing some part of compiler for a project. I require grammar > (BNF or EBNF) for converting the C code in the IR as it is not been > mentioned any where over your official website. > > I don't know what you mean by "converting the C code in the IR". > > Regardless, LLVM does not have BNF or EBNF forms (or any formalized grammar) for C, nor for LLVM IR. Sorry! > > Nick > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > > > Thanks, > Brian Herman > college.nfshost.com > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Jean-Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton at korobeynikov.info Mon Jul 29 06:30:12 2013 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Mon, 29 Jul 2013 17:30:12 +0400 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <51F6676E.2090800@free.fr> References: <51F6676E.2090800@free.fr> Message-ID: >> object in place on the stack or at least call its copy constructor. > > > what does GCC do? Nothing. It does not support MSVC ABI. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From baldrick at free.fr Mon Jul 29 06:40:47 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 29 Jul 2013 15:40:47 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> Message-ID: <51F670DF.5080804@free.fr> On 29/07/13 15:30, Anton Korobeynikov wrote: >>> object in place on the stack or at least call its copy constructor. >> >> >> what does GCC do? > Nothing. It does not support MSVC ABI. Maybe we shouldn't either :) So the ABI requires the struct to be pushed on the stack by the caller along with the other parameters, and what's more it requires the caller to execute some code on that copy, in place on the stack, before performing the call. Is that right? Ciao, Duncan. From chad.rosier at gmail.com Mon Jul 29 06:50:42 2013 From: chad.rosier at gmail.com (Chad Rosier) Date: Mon, 29 Jul 2013 09:50:42 -0400 Subject: [LLVMdev] Question on optimizeThumb2JumpTables In-Reply-To: <00e401ce87ac$595e73d0$0c1b5b70$@codeaurora.org> References: <00e401ce87ac$595e73d0$0c1b5b70$@codeaurora.org> Message-ID: Hi Jakob, You're the unfortunate soul who last touched the constant island pass, right? Do you happen to have any insight for Daniel? Chad On Tue, Jul 23, 2013 at 9:55 AM, Daniel Stewart wrote: > In looking at the code in > ARMConstantislandPass.cpp::optimizeThumb2JumpTables(), I see that there is > the following condition for not creating tbb-based jump tables:**** > > ** ** > > // The instruction should be a tLEApcrel or t2LEApcrelJT; we want*** > * > > // to delete it as well.**** > > *MachineInstr* *LeaMI = PrevI;**** > > *if* ((LeaMI->getOpcode() != *ARM*::tLEApcrelJT &&**** > > LeaMI->getOpcode() != *ARM*::t2LEApcrelJT) ||**** > > LeaMI->getOperand(0).getReg() != BaseReg)**** > > OptOk = *false*;**** > > ** ** > > *if* (!OptOk)**** > > *continue*;**** > > ** ** > > I am trying to figure out why the restriction of > LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly > restrictive. For example, here is a case where it succeeds:**** > > ** ** > > 8944B BB#53: derived from LLVM BB %172**** > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11**** > > Predecessors according to CFG: BB#52**** > > 8976B %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg*** > * > > 8992B %R1 = t2ADDrs %R1, %R10, 18, pred:14, > pred:%noreg, opt:%noreg**** > > 9004B %LR = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg**** > > 9008B t2BR_JT %R1, %R10, , 2**** > > ** ** > > Shrink JT: t2BR_JT %R1, %R10, , 2**** > > addr: %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, > opt:%noreg**** > > lea: %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg**** > > ** ** > > ** ** > > From this we see that the BaseReg = R1. R1 also happens to be the register > used in the t2ADDrs calculation as well as defined by the t2LEApcrelJT > operation. Because R1 is defined by t2LEApcrelJT, the restriction is met. > **** > > ** ** > > However, in the next example, it fails:**** > > ** ** > > 5808B BB#30: derived from LLVM BB %105**** > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11**** > > Predecessors according to CFG: BB#29**** > > 5840B %R3 = t2LEApcrelJT , 1, pred:14, pred:%noreg*** > * > > 5856B %R2 = t2ADDrs %R3, %R7, 18, pred:14, > pred:%noreg, opt:%noreg**** > > 5872B t2BR_JT %R2, %R7, , 1**** > > Successors according to CFG: BB#90(17) BB#31(17) BB#32(17) > BB#33(17) BB#34(17) BB#51(17)**** > > ** ** > > Here we see that the BaseReg = R2. But the t2LEApcrelJT instruction > defines R3, not R2. But this is should be fine, because the t2ADDrs > instruction takes R3 and defines R2, which is the real base address. **** > > ** ** > > So my question is why is the restriction LeaMI->getOperand(0).getReg() != > BaseReg there? Shouldn’t the restriction be trying to ensure that the > register defined by t2LEApcrelJT also be the register used by the t2ADDrs > instruction? It seems this test is being overly restrictive.**** > > ** ** > > Daniel**** > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnk at google.com Mon Jul 29 07:36:30 2013 From: rnk at google.com (Reid Kleckner) Date: Mon, 29 Jul 2013 10:36:30 -0400 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <51F670DF.5080804@free.fr> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> Message-ID: On Mon, Jul 29, 2013 at 9:40 AM, Duncan Sands wrote: > On 29/07/13 15:30, Anton Korobeynikov wrote: > >> object in place on the stack or at least call its copy constructor. >>>> >>> >>> >>> what does GCC do? >>> >> Nothing. It does not support MSVC ABI. >> > > Maybe we shouldn't either :) So the ABI requires the struct to be pushed > on the > stack by the caller along with the other parameters, and what's more it > requires > the caller to execute some code on that copy, in place on the stack, before > performing the call. Is that right? > Just calling the copy ctor is the bare minimum needed to support the ABI. cl.exe uses a more efficient lowering that evaluates the arguments right to left directly into the outgoing argument stack slots. It lowers following call to bar to something like: struct A {int x; int y;}; A foo(); void bar(int, A, int); ... bar(0xdead, foo(), 0xbeef); x86_32 pseudo intel asm: push 0xdead sub esp, 8 push esp # sret arg for foo call foo add esp, 4 # clear sret arg push 0xbeef call bar add 16 # clear args With the current proposal, LLVM would generate code that adjusts the stack once to begin the call, and once more to end it. Initially, it would do a blind SP restore, but that can be optimized to a constant SP adjustment. Arguments will still be evaluated left to right and will be stored into the special alloca outgoing argument slots. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Mon Jul 29 07:37:14 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 29 Jul 2013 07:37:14 -0700 Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <20130729101810.GO11371MdfPADPa@purples> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> <51F5AC61.2080108@grosser.es> <20130729101810.GO11371MdfPADPa@purples> Message-ID: <51F67E1A.6060400@grosser.es> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: > On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >> Sven: In terms of making the behaviour of isl easier to understand, >> it may make sense to fail/assert in case operands have parameters that >> are named identical, but that refer to different pointer values. > > No, you are allowed to have different identifiers with the same name. > I could optionally print the pointer values, but then I'd have > to think about what to do with them when reading a textual > representation of a set with such pointer values in them. Yes, this is how it is today. I wondered if there is actually a need to allow the use of different identifiers with the same name (except all being unnamed?). I personally do not see such a need and would prefer isl to assert/fail in case someone tries to do so. This may avoid confusions as happened here. Do you see a reason why isl should allow this? Cheers, Tobias From baldrick at free.fr Mon Jul 29 07:45:14 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 29 Jul 2013 16:45:14 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> Message-ID: <51F67FFA.6080102@free.fr> Hi Reid, On 29/07/13 16:36, Reid Kleckner wrote: > On Mon, Jul 29, 2013 at 9:40 AM, Duncan Sands > wrote: > > On 29/07/13 15:30, Anton Korobeynikov wrote: > > object in place on the stack or at least call its copy constructor. > > > > what does GCC do? > > Nothing. It does not support MSVC ABI. > > > Maybe we shouldn't either :) So the ABI requires the struct to be pushed on the > stack by the caller along with the other parameters, and what's more it requires > the caller to execute some code on that copy, in place on the stack, before > performing the call. Is that right? > > > Just calling the copy ctor is the bare minimum needed to support the ABI. > cl.exe uses a more efficient lowering that evaluates the arguments right to > left directly into the outgoing argument stack slots. It lowers following call > to bar to something like: > > struct A {int x; int y;}; > A foo(); > void bar(int, A, int); > ... > bar(0xdead, foo(), 0xbeef); > > x86_32 pseudo intel asm: > > push 0xdead > sub esp, 8 > push esp # sret arg for foo > call foo > add esp, 4 # clear sret arg > push 0xbeef > call bar I got confused by your example. Is the struct passed on the stack, amongst the other parameters, or can it be allocated somewhere else and a pointer to it passed? Because in your example it isn't clear to me how the return value of foo() is being passed to bar(). Ciao, Duncan. > add 16 # clear args > > With the current proposal, LLVM would generate code that adjusts the stack once > to begin the call, and once more to end it. Initially, it would do a blind SP > restore, but that can be optimized to a constant SP adjustment. Arguments will > still be evaluated left to right and will be stored into the special alloca > outgoing argument slots. From mihail.popa at gmail.com Mon Jul 29 07:47:55 2013 From: mihail.popa at gmail.com (Mihail Popa) Date: Mon, 29 Jul 2013 15:47:55 +0100 Subject: [LLVMdev] [PATCH] Add support for ARM modified immediate syntax Message-ID: Hi. Please review the attached patch. It adds support for "modified immediate" syntax form to the relevant ARM instructions. Basically these immediate fields are represented via a pair of values: one being a base value and the other a modifier. Since some of these values may have multiple representations, the ARM architecture specification defines a syntax to guarantee that a specific encoding is obtained. Supporting this syntax is a requirement for any real assembler/disassembler testing since it's the only way to guarantee that a binary can be disassembled and reassembled into an exact copy of its former self. Regards, Mihai -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LLVM-759.modimm.patch Type: application/octet-stream Size: 24578 bytes Desc: not available URL: From rnk at google.com Mon Jul 29 07:52:06 2013 From: rnk at google.com (Reid Kleckner) Date: Mon, 29 Jul 2013 10:52:06 -0400 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <51F67FFA.6080102@free.fr> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <51F67FFA.6080102@free.fr> Message-ID: On Mon, Jul 29, 2013 at 10:45 AM, Duncan Sands wrote: > Hi Reid, > > > On 29/07/13 16:36, Reid Kleckner wrote: > >> On Mon, Jul 29, 2013 at 9:40 AM, Duncan Sands > > wrote: >> >> On 29/07/13 15:30, Anton Korobeynikov wrote: >> >> object in place on the stack or at least call its copy >> constructor. >> >> >> >> what does GCC do? >> >> Nothing. It does not support MSVC ABI. >> >> >> Maybe we shouldn't either :) So the ABI requires the struct to be >> pushed on the >> stack by the caller along with the other parameters, and what's more >> it requires >> the caller to execute some code on that copy, in place on the stack, >> before >> performing the call. Is that right? >> >> >> Just calling the copy ctor is the bare minimum needed to support the ABI. >> cl.exe uses a more efficient lowering that evaluates the arguments >> right to >> left directly into the outgoing argument stack slots. It lowers >> following call >> to bar to something like: >> >> struct A {int x; int y;}; >> A foo(); >> void bar(int, A, int); >> ... >> bar(0xdead, foo(), 0xbeef); >> >> x86_32 pseudo intel asm: >> >> push 0xdead >> sub esp, 8 >> push esp # sret arg for foo >> call foo >> add esp, 4 # clear sret arg >> push 0xbeef >> call bar >> > > I got confused by your example. Is the struct passed on the stack, > amongst the > other parameters, or can it be allocated somewhere else and a pointer to it > passed? Because in your example it isn't clear to me how the return value > of > foo() is being passed to bar(). > Yes, the struct is passed amongst the other parameters. LLVM already supports passing C structs this way using byval, but it emits a memcpy in the backend, which breaks C++ objects. foo() is returning an A struct via the usual hidden sret parameter. The "push esp" is taking the address of the struct's outgoing arg slot and passing it to foo(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Mon Jul 29 07:59:05 2013 From: baldrick at free.fr (Duncan Sands) Date: Mon, 29 Jul 2013 16:59:05 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <51F67FFA.6080102@free.fr> Message-ID: <51F68339.30101@free.fr> Hi Reid, > struct A {int x; int y;}; > A foo(); > void bar(int, A, int); > ... > bar(0xdead, foo(), 0xbeef); > > x86_32 pseudo intel asm: > > push 0xdead > sub esp, 8 > push esp # sret arg for foo > call foo > add esp, 4 # clear sret arg > push 0xbeef > call bar > > > I got confused by your example. Is the struct passed on the stack, amongst the > other parameters, or can it be allocated somewhere else and a pointer to it > passed? Because in your example it isn't clear to me how the return value of > foo() is being passed to bar(). > > > Yes, the struct is passed amongst the other parameters. LLVM already supports > passing C structs this way using byval, but it emits a memcpy in the backend, > which breaks C++ objects. > > foo() is returning an A struct via the usual hidden sret parameter. The "push > esp" is taking the address of the struct's outgoing arg slot and passing it to > foo(). indeed! This seems perfectly obvious now that you explained it, sorry for being dense. Ciao, Duncan. From eliben at google.com Mon Jul 29 08:56:11 2013 From: eliben at google.com (Eli Bendersky) Date: Mon, 29 Jul 2013 08:56:11 -0700 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: References: Message-ID: On Sat, Jul 27, 2013 at 9:02 AM, Vijay Daultani wrote: > Respected Sir/Madam, > > As I was developing some part of compiler for a project. I require grammar > (BNF or EBNF) for converting the C code in the IR as it is not been > mentioned any where over your official website. > > Awaiting for your help. > > Hi Vijay, If you are asking about compiling C into LLVM IR - take a look at Clang: http://clang.llvm.org/. Clang is a C, C++ and ObjC front-end for LLVM. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Mon Jul 29 09:05:21 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Mon, 29 Jul 2013 11:05:21 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> Message-ID: <51F692C1.2080901@codeaurora.org> On 7/16/2013 11:38 PM, Andrew Trick wrote: > Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. > > To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. > > --- > Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. > > An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. > > Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. > > IR Canonicalization Pipeline: > > Function Passes { > SimplifyCFG > SROA-1 > EarlyCSE > } > Call-Graph SCC Passes { > Inline > Function Passes { > EarlyCSE > SimplifyCFG > InstCombine > Early Loop Opts { > LoopSimplify > Rotate (when obvious) > Full-Unroll (when obvious) > } > SROA-2 > InstCombine > GVN > Reassociate > Generic Loop Opts { > LICM (Rotate on-demand) > Unswitch > } > SCCP > InstCombine > JumpThreading > CorrelatedValuePropagation > AggressiveDCE > } > } > I'm a bit late to this, but the examples of the "generic loop opts" above are really better left until later. They have a potential to obscure the code and make other loop optimizations harder. Specifically, there has to be a place where loop nest optimizations can be done (such as loop interchange or unroll-and-jam). There is also array expansion and loop distribution, which can be highly target-dependent in terms of their applicability. I don't know if TTI could provide enough details to account for all circumstances that would motivate such transformations, but assuming that it could, there still needs to be a room left for it in the design. On a different, but related note---one thing I've asked recently was about the "proper" solution for recognizing target specific loop idioms. On Hexagon, we have a builtin functions that handle certain specific loop patterns. In order to separate the target-dependent code from the target-independent, we would basically have to replicate the loop idiom recognition in our own target-specific pass. Not only that, but it would have to run before the loops may be subjected to other optimizations that could obfuscate the opportunity. To solve this, I was thinking about having target-specific hooks in the idiom recognition code, that could transform a given loop in the target's own way. Still, that would imply target-specific transformations running before the "official" lowering code. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From kledzik at apple.com Mon Jul 29 09:24:07 2013 From: kledzik at apple.com (Nick Kledzik) Date: Mon, 29 Jul 2013 09:24:07 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> Message-ID: <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: > Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. For reference, with mach-o we just added a flag to the overall .o file that says all sections are "safe". The compiler always generates safe object files (unless there is inline code with non-local labels) and always sets the flag. Hand written assembly files did not have the flag by default, but savvy assembly programmers can set it. -Nick From eliben at google.com Mon Jul 29 09:24:03 2013 From: eliben at google.com (Eli Bendersky) Date: Mon, 29 Jul 2013 09:24:03 -0700 Subject: [LLVMdev] IntrinsicLowering::AddPrototypes In-Reply-To: <51F59500.3070106@mips.com> References: <51F58EDF.3030101@mips.com> <51F59500.3070106@mips.com> Message-ID: On Sun, Jul 28, 2013 at 3:02 PM, Reed Kotler wrote: > Ooops... Ignore this previous mail. > > The problem still exists with this change. > > > On 07/28/2013 02:36 PM, reed kotler wrote: > >> It seems that several intrinsics are missing from this routine. >> >> In particular, floor, which was causing problems in the mips16 port. >> >> Is there some reason to not add the ones that are missing? >> >> For example, adding the following fixed my problem with floor. >> >> case Intrinsic::floor: >> EnsureFPIntrinsicsExist(M, I, "floorf", "floor", "floor"); >> break; >> > > Note that this code is being used mainly (only?) by the Interpreter. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Mon Jul 29 09:24:51 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Mon, 29 Jul 2013 11:24:51 -0500 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> Message-ID: <51F69753.50300@codeaurora.org> On 7/24/2013 11:37 PM, Chris Lattner wrote: > > How about this: keep the jist of the current API, but drop the > "warning"- or "error"-ness of the API. Instead, the backend just > includes an enum value (plus string message for extra data). The > frontend makes the decision of how to render the diagnostic (or not, > dropping them is fine) along with how to map them onto warning/error or > whatever concepts they use. Also, having centralized handling of compiler messages has the advantage that it integrates with the mechanism of suppressing specific messages, or changing their severity. For example, a user may want to consider all warnings as errors, except a few specific examples, which would be suppressed. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From stoklund at 2pi.dk Mon Jul 29 09:25:51 2013 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Mon, 29 Jul 2013 09:25:51 -0700 Subject: [LLVMdev] Question on optimizeThumb2JumpTables In-Reply-To: References: <00e401ce87ac$595e73d0$0c1b5b70$@codeaurora.org> Message-ID: On Jul 29, 2013, at 6:50 AM, Chad Rosier wrote: > Hi Jakob, > You're the unfortunate soul who last touched the constant island pass, right? Do you happen to have any insight for Daniel? Sorry, no. I don't remember working with that particular bit of code. You could try digging through the commit logs. Thanks, /jakob > On Tue, Jul 23, 2013 at 9:55 AM, Daniel Stewart wrote: > In looking at the code in ARMConstantislandPass.cpp::optimizeThumb2JumpTables(), I see that there is the following condition for not creating tbb-based jump tables: > > > > // The instruction should be a tLEApcrel or t2LEApcrelJT; we want > > // to delete it as well. > > MachineInstr *LeaMI = PrevI; > > if ((LeaMI->getOpcode() != ARM::tLEApcrelJT && > > LeaMI->getOpcode() != ARM::t2LEApcrelJT) || > > LeaMI->getOperand(0).getReg() != BaseReg) > > OptOk = false; > > > > if (!OptOk) > > continue; > > > > I am trying to figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds: > > > > 8944B BB#53: derived from LLVM BB %172 > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 > > Predecessors according to CFG: BB#52 > > 8976B %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg > > 8992B %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg > > 9004B %LR = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg > > 9008B t2BR_JT %R1, %R10, , 2 > > > > Shrink JT: t2BR_JT %R1, %R10, , 2 > > addr: %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg > > lea: %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg > > > > > > From this we see that the BaseReg = R1. R1 also happens to be the register used in the t2ADDrs calculation as well as defined by the t2LEApcrelJT operation. Because R1 is defined by t2LEApcrelJT, the restriction is met. > > > > However, in the next example, it fails: > > > > 5808B BB#30: derived from LLVM BB %105 > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 > > Predecessors according to CFG: BB#29 > > 5840B %R3 = t2LEApcrelJT , 1, pred:14, pred:%noreg > > 5856B %R2 = t2ADDrs %R3, %R7, 18, pred:14, pred:%noreg, opt:%noreg > > 5872B t2BR_JT %R2, %R7, , 1 > > Successors according to CFG: BB#90(17) BB#31(17) BB#32(17) BB#33(17) BB#34(17) BB#51(17) > > > > Here we see that the BaseReg = R2. But the t2LEApcrelJT instruction defines R3, not R2. But this is should be fine, because the t2ADDrs instruction takes R3 and defines R2, which is the real base address. > > > > So my question is why is the restriction LeaMI->getOperand(0).getReg() != BaseReg there? Shouldn’t the restriction be trying to ensure that the register defined by t2LEApcrelJT also be the register used by the t2ADDrs instruction? It seems this test is being overly restrictive. > > > > Daniel > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at grosser.es Mon Jul 29 09:27:27 2013 From: tobias at grosser.es (Tobias Grosser) Date: Mon, 29 Jul 2013 09:27:27 -0700 Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <20130729161501.GS11371MdfPADPa@purples> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> <51F5AC61.2080108@grosser.es> <20130729101810.GO11371MdfPADPa@purples> <51F67E1A.6060400@grosser.es> <20130729161501.GS11371MdfPADPa@purples> Message-ID: <51F697EF.10405@grosser.es> On 07/29/2013 09:15 AM, Sven Verdoolaege wrote: > On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote: >> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: >>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >>>> Sven: In terms of making the behaviour of isl easier to understand, >>>> it may make sense to fail/assert in case operands have parameters that >>>> are named identical, but that refer to different pointer values. >>> >>> No, you are allowed to have different identifiers with the same name. >>> I could optionally print the pointer values, but then I'd have >>> to think about what to do with them when reading a textual >>> representation of a set with such pointer values in them. >> >> Yes, this is how it is today. > > No, the pointer values are currently not printed. I was referring to the first sentence. I do not think printing pointer values is what we want. It would make the output unpredictable not only when address space randomisation is involved. >> I wondered if there is actually a need to >> allow the use of different identifiers with the same name (except all being >> unnamed?). I personally do not see such a need and would prefer isl to >> assert/fail in case someone tries to do so. This may avoid confusions as >> happened here. Do you see a reason why isl should allow this? > > Removing this feature would break existing users. Even if it would, the benefits for future users may outweigh this. Also, are you aware of a user that actually breaks? Anyway, on the Polly side we know the behaviour and can handle it. So this is nothing I am very strong about. I just mentioned it as it seemed to be a good idea. Cheers, Tobias From shankare at codeaurora.org Mon Jul 29 10:09:47 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Mon, 29 Jul 2013 12:09:47 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> Message-ID: <51F6A1DB.5020909@codeaurora.org> On 7/29/2013 11:24 AM, Nick Kledzik wrote: > On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. > >From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. > > For reference, with mach-o we just added a flag to the overall .o file that says all sections are "safe". The compiler always generates safe object files (unless there is inline code with non-local labels) and always sets the flag. Hand written assembly files did not have the flag by default, but savvy assembly programmers can set it. We could set this flag for ELF too in the ELF header, but it wouldnot not confirm to the ELF ABI. To account safe sections, we should just create an additional section in the ELF (gcc creates a lot many sections to handle executable stack and for LTO). This would just be another section to dictate what sections are safe. Isnt it better to have this flag set for every section in Darwin too, makes it flexible. I am not sure about the ABI concerns on Darwin though. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From echristo at gmail.com Mon Jul 29 10:34:40 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 29 Jul 2013 10:34:40 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <51F69753.50300@codeaurora.org> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <51F69753.50300@codeaurora.org> Message-ID: On Mon, Jul 29, 2013 at 9:24 AM, Krzysztof Parzyszek wrote: > On 7/24/2013 11:37 PM, Chris Lattner wrote: >> >> >> How about this: keep the jist of the current API, but drop the >> "warning"- or "error"-ness of the API. Instead, the backend just >> includes an enum value (plus string message for extra data). The >> frontend makes the decision of how to render the diagnostic (or not, >> dropping them is fine) along with how to map them onto warning/error or >> whatever concepts they use. > > > Also, having centralized handling of compiler messages has the advantage > that it integrates with the mechanism of suppressing specific messages, or > changing their severity. For example, a user may want to consider all > warnings as errors, except a few specific examples, which would be > suppressed. > This is why the front end/caller into the backend should handle the actual diagnostic since it will have that knowledge and why we're working on it from that perspective. -eric From kparzysz at codeaurora.org Mon Jul 29 10:39:39 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Mon, 29 Jul 2013 12:39:39 -0500 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <51F69753.50300@codeaurora.org> Message-ID: <51F6A8DB.6040707@codeaurora.org> On 7/29/2013 12:34 PM, Eric Christopher wrote: > > This is why the front end/caller into the backend should handle the > actual diagnostic since it will have that knowledge and why we're > working on it from that perspective. I haven't read all of the posts in detail, but wasn't it already demonstrated as insufficient? What about messages about optimization hints---those generally cannot be detected by the front-end. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From tonic at nondot.org Mon Jul 29 10:47:57 2013 From: tonic at nondot.org (Tanya Lattner) Date: Mon, 29 Jul 2013 10:47:57 -0700 Subject: [LLVMdev] Announcement: llvm.org changing name servers (Complete) In-Reply-To: <617D8BE0-D0BB-4B67-B735-C7AF6C9B270D@nondot.org> References: <617D8BE0-D0BB-4B67-B735-C7AF6C9B270D@nondot.org> Message-ID: <21EF690A-6BC8-41FB-83A4-7D8856D59682@nondot.org> As of Friday evening, the name servers have been changed. Things appear to be working correctly, but if you have any issues, please email me directly. Thanks, Tanya On Jul 26, 2013, at 2:03 PM, Tanya Lattner wrote: > Just letting everyone know that llvm.org will be changing name servers over the next few days to a week. You may experience some strangeness as any bugs are worked out but hopefully you won't notice at all. I will send email once the transition is complete. > > Thanks, > Tanya From echristo at gmail.com Mon Jul 29 10:47:52 2013 From: echristo at gmail.com (Eric Christopher) Date: Mon, 29 Jul 2013 10:47:52 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: <51F6A8DB.6040707@codeaurora.org> References: <4D9E6E3A-42A3-4DFD-8A40-A7A797E34ACD@apple.com> <6A3B9D20-8241-44FA-8EF5-08C163777B04@apple.com> <51F69753.50300@codeaurora.org> <51F6A8DB.6040707@codeaurora.org> Message-ID: On Mon, Jul 29, 2013 at 10:39 AM, Krzysztof Parzyszek wrote: > On 7/29/2013 12:34 PM, Eric Christopher wrote: >> >> >> This is why the front end/caller into the backend should handle the >> actual diagnostic since it will have that knowledge and why we're >> working on it from that perspective. > > > I haven't read all of the posts in detail, but wasn't it already > demonstrated as insufficient? What about messages about optimization > hints---those generally cannot be detected by the front-end. > You misread what I said. I said the actual diagnostic... not all parts of it. The information should still be passed via callback. You should read the whole thread. -eric From dlmeetei at gmail.com Mon Jul 29 11:01:00 2013 From: dlmeetei at gmail.com (Devchandra L Meetei) Date: Mon, 29 Jul 2013 23:31:00 +0530 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: References: Message-ID: Seems that Vijay is asking about converting C program to LLVM IR On Mon, Jul 29, 2013 at 9:26 PM, Eli Bendersky wrote: > > > > On Sat, Jul 27, 2013 at 9:02 AM, Vijay Daultani wrote: > >> Respected Sir/Madam, >> >> As I was developing some part of compiler for a project. I require grammar >> (BNF or EBNF) for converting the C code in the IR as it is not been >> mentioned any where over your official website. >> >> Awaiting for your help. >> >> > Hi Vijay, > > If you are asking about compiling C into LLVM IR - take a look at Clang: > http://clang.llvm.org/. Clang is a C, C++ and ObjC front-end for LLVM. > > Eli > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- Warm Regards --Dev OpenPegasus Developer/Committer (\__/) (='.'=) This is Bunny. Copy and paste bunny (")_(") to help him gain world domination. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nrotem at apple.com Mon Jul 29 11:23:58 2013 From: nrotem at apple.com (Nadav Rotem) Date: Mon, 29 Jul 2013 11:23:58 -0700 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> Message-ID: <1A62205A-6544-421E-B965-29C7779E877E@apple.com> On Jul 28, 2013, at 12:20 AM, Chandler Carruth wrote: > That said, why -O3? I think we should just enable this across the board, as it doesn't seem to cause any size regression under any mode, and the compile time hit is really low. I agree. I think that it would be a good idea to enable it for -Os and -O2, but I’d like to make one step at a time. Thanks, Nadav -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdamon at accesio.com Mon Jul 29 11:28:49 2013 From: jdamon at accesio.com (Jimi Damon) Date: Mon, 29 Jul 2013 11:28:49 -0700 Subject: [LLVMdev] llvm-g++ 4.6.4 unable to compile simple shared library on Ubuntu 12.04 x86_64 Message-ID: <51F6B461.4000006@accesio.com> Hi, I am trying to release a Makefile for building my company's software that will be flexible enough to use the llvm suite of compilers to build shared libraries for talking to USB peripherals. The problem that I am having is that while I am able to build a shared library using llvm-gcc , the llvm-g++ compiler is giving me error messages saying " relocation R_X86_64_PC32 against undefined symbol `__morestack' can not be used when making a shared object; recompile with -fPIC Here's my configuration on my local machine: cat /etc/issue Ubuntu-12.04 uname -a Linux localhost 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:43:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux dpkg -l | grep llvm ii libllvm2.7 2.7-0ubuntu1 Low-Level Virtual Machine (LLVM) (runtime library) ii libllvm2.9 2.9+dfsg-3ubuntu4 Low-Level Virtual Machine (LLVM), runtime library ii libllvm3.0 3.0-4ubuntu1 Low-Level Virtual Machine (LLVM), runtime library ii libllvm3.0:i386 3.0-4ubuntu1 Low-Level Virtual Machine (LLVM), runtime library ii llvm 2.9-7 Low-Level Virtual Machine (LLVM) ii llvm-2.9 2.9+dfsg-3ubuntu4 Low-Level Virtual Machine (LLVM) ii llvm-2.9-dev 2.9+dfsg-3ubuntu4 Low-Level Virtual Machine (LLVM), libraries and headers ii llvm-2.9-runtime 2.9+dfsg-3ubuntu4 Low-Level Virtual Machine (LLVM), bytecode interpreter ii llvm-3.0 3.0-4ubuntu1 Low-Level Virtual Machine (LLVM) ii llvm-3.0-dev 3.0-4ubuntu1 Low-Level Virtual Machine (LLVM), libraries and headers ii llvm-3.0-runtime 3.0-4ubuntu1 Low-Level Virtual Machine (LLVM), bytecode interpreter ii llvm-dev 2.9-7 Low-Level Virtual Machine (LLVM), libraries and headers ii llvm-gcc-4.6 3.0-3 C front end for LLVM C/C++ compiler ii llvm-runtime 2.9-7 Low-Level Virtual Machine (LLVM), bytecode interpreter I have a very simple example that appears to illustrate this problem /******** squared.h BEGIN *****/ int squared(int val); /********* END *****/ /******* squared.c BEGIN ****/ int squared(int val) { return val*val; } /********* END ****/ /**** main.c BEGIN *****/ #include int main(int argc,char *argv[] ) { int i = 0; for( i = 0; i < 10 ; i ++ ) { printf("%d\n",squared(i)); } } /**** END *******/ Using llvm-gcc works just fine: llvm-gcc -fPIC -c -o squared.o squared.c llvm-gcc -shared -o libsquared.so squared.o llvm-gcc -L. -o main main.c -lsquared LD_LIBRARY_PATH=. ./main 0 1 4 9 16 25 36 49 64 81 The creation of the shared library fails when using llvm-g++ llvm-g++ -fPIC -c -o squared.o squared.c llvm-g++ -shared -o libsquared.so squared.o /usr/bin/ld: squared.o: relocation R_X86_64_PC32 against undefined symbol `__morestack' can not be used when making a shared object; recompile with -fPIC This exact compile works on a 32 bit version of Ubuntu 12.04 , i686 cat /etc/issue Ubuntu 12.04.2 LTS \n \l uname -a Linux gdbserver 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:15:33 UTC 2013 i686 i686 i386 GNU/Linux llvm-g++ -fPIC -c -o squared.o squared.c llvm-g++ -shared -o libsquared.so squared.o This succeeds and I am able to use the shared library to compile the main program. Any ideas for how to get passed this problem with llvm-g++ ? Thanks -Jimi From rkotler at mips.com Mon Jul 29 11:36:42 2013 From: rkotler at mips.com (reed kotler) Date: Mon, 29 Jul 2013 11:36:42 -0700 Subject: [LLVMdev] IntrinsicLowering::AddPrototypes In-Reply-To: References: <51F58EDF.3030101@mips.com> <51F59500.3070106@mips.com> Message-ID: <51F6B63A.7030909@mips.com> On 07/29/2013 09:24 AM, Eli Bendersky wrote: > > > > On Sun, Jul 28, 2013 at 3:02 PM, Reed Kotler > wrote: > > Ooops... Ignore this previous mail. > > The problem still exists with this change. > > > On 07/28/2013 02:36 PM, reed kotler wrote: > > It seems that several intrinsics are missing from this routine. > > In particular, floor, which was causing problems in the mips16 > port. > > Is there some reason to not add the ones that are missing? > > For example, adding the following fixed my problem with floor. > > case Intrinsic::floor: > EnsureFPIntrinsicsExist(M, I, "floorf", "floor", > "floor"); > break; > > > > Note that this code is being used mainly (only?) by the Interpreter. > > Eli Yes. I don't know the source of this problem with some intrinsics being declared improperly. Sin, cos are okay but Floor and trunc not. I have a workaround that I will implement today. For mips16 pic mode I need to make calls to helper functions for functions with certain prototype. I intend to move this logic to an earlier place where I am operating on clang ir but for now it is being handled during call lowering and I rely on Args and RetTy being correct. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Mon Jul 29 12:39:04 2013 From: grosbach at apple.com (Jim Grosbach) Date: Mon, 29 Jul 2013 12:39:04 -0700 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: <1A62205A-6544-421E-B965-29C7779E877E@apple.com> References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> <1A62205A-6544-421E-B965-29C7779E877E@apple.com> Message-ID: <99897FE3-5C07-4702-8DE8-DF76DB551609@apple.com> On Jul 29, 2013, at 11:23 AM, Nadav Rotem wrote: > > On Jul 28, 2013, at 12:20 AM, Chandler Carruth wrote: > >> That said, why -O3? I think we should just enable this across the board, as it doesn't seem to cause any size regression under any mode, and the compile time hit is really low. > > > I agree. I think that it would be a good idea to enable it for -Os and -O2, but I’d like to make one step at a time. > These results are really excellent. They’re on Intel, I assume, right? What do the ARM numbers look like? Before enabling by default, we should make sure that the results are comparable there as well. -JIm > Thanks, > Nadav > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chuck.Caldarale at unisys.com Mon Jul 29 12:41:42 2013 From: Chuck.Caldarale at unisys.com (Caldarale, Charles R) Date: Mon, 29 Jul 2013 14:41:42 -0500 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: References: Message-ID: <99C8B2929B39C24493377AC7A121E21FC5AE58675B@USEA-EXCH8.na.uis.unisys.com> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Devchandra L Meetei > Subject: Re: [LLVMdev] Require Grammar for converting C to IR > > If you are asking about compiling C into LLVM IR - take a look at Clang: > > http://clang.llvm.org/. Clang is a C, C++ and ObjC front-end for LLVM. > Seems that Vijay is asking about converting C program to LLVM IR Which is exactly what clang does, hence the previous answer from Eli. - Chuck From grosbach at apple.com Mon Jul 29 12:56:22 2013 From: grosbach at apple.com (Jim Grosbach) Date: Mon, 29 Jul 2013 12:56:22 -0700 Subject: [LLVMdev] Question on optimizeThumb2JumpTables In-Reply-To: References: <00e401ce87ac$595e73d0$0c1b5b70$@codeaurora.org> Message-ID: That code is probably mine originally, but it’s been a very long time since I touched the TBB/TBH stuff (4 years or so, IIRC), and I have no recollection of the specifics of that particular restriction. It’s entirely possible it is indeed too restrictive. -Jim On Jul 29, 2013, at 9:25 AM, Jakob Stoklund Olesen wrote: > > On Jul 29, 2013, at 6:50 AM, Chad Rosier wrote: > >> Hi Jakob, >> You're the unfortunate soul who last touched the constant island pass, right? Do you happen to have any insight for Daniel? > > Sorry, no. I don't remember working with that particular bit of code. You could try digging through the commit logs. > > Thanks, > /jakob > > > >> On Tue, Jul 23, 2013 at 9:55 AM, Daniel Stewart wrote: >> In looking at the code in ARMConstantislandPass.cpp::optimizeThumb2JumpTables(), I see that there is the following condition for not creating tbb-based jump tables: >> >> >> >> // The instruction should be a tLEApcrel or t2LEApcrelJT; we want >> >> // to delete it as well. >> >> MachineInstr *LeaMI = PrevI; >> >> if ((LeaMI->getOpcode() != ARM::tLEApcrelJT && >> >> LeaMI->getOpcode() != ARM::t2LEApcrelJT) || >> >> LeaMI->getOperand(0).getReg() != BaseReg) >> >> OptOk = false; >> >> >> >> if (!OptOk) >> >> continue; >> >> >> >> I am trying to figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds: >> >> >> >> 8944B BB#53: derived from LLVM BB %172 >> >> Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 >> >> Predecessors according to CFG: BB#52 >> >> 8976B %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg >> >> 8992B %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg >> >> 9004B %LR = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg >> >> 9008B t2BR_JT %R1, %R10, , 2 >> >> >> >> Shrink JT: t2BR_JT %R1, %R10, , 2 >> >> addr: %R1 = t2ADDrs %R1, %R10, 18, pred:14, pred:%noreg, opt:%noreg >> >> lea: %R1 = t2LEApcrelJT , 2, pred:14, pred:%noreg >> >> >> >> >> >> From this we see that the BaseReg = R1. R1 also happens to be the register used in the t2ADDrs calculation as well as defined by the t2LEApcrelJT operation. Because R1 is defined by t2LEApcrelJT, the restriction is met. >> >> >> >> However, in the next example, it fails: >> >> >> >> 5808B BB#30: derived from LLVM BB %105 >> >> Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 >> >> Predecessors according to CFG: BB#29 >> >> 5840B %R3 = t2LEApcrelJT , 1, pred:14, pred:%noreg >> >> 5856B %R2 = t2ADDrs %R3, %R7, 18, pred:14, pred:%noreg, opt:%noreg >> >> 5872B t2BR_JT %R2, %R7, , 1 >> >> Successors according to CFG: BB#90(17) BB#31(17) BB#32(17) BB#33(17) BB#34(17) BB#51(17) >> >> >> >> Here we see that the BaseReg = R2. But the t2LEApcrelJT instruction defines R3, not R2. But this is should be fine, because the t2ADDrs instruction takes R3 and defines R2, which is the real base address. >> >> >> >> So my question is why is the restriction LeaMI->getOperand(0).getReg() != BaseReg there? Shouldn’t the restriction be trying to ensure that the register defined by t2LEApcrelJT also be the register used by the t2ADDrs instruction? It seems this test is being overly restrictive. >> >> >> >> Daniel >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Mon Jul 29 13:07:28 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 29 Jul 2013 21:07:28 +0100 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: <99897FE3-5C07-4702-8DE8-DF76DB551609@apple.com> References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> <1A62205A-6544-421E-B965-29C7779E877E@apple.com> <99897FE3-5C07-4702-8DE8-DF76DB551609@apple.com> Message-ID: On 29 July 2013 20:39, Jim Grosbach wrote: > These results are really excellent. They’re on Intel, I assume, right? > What do the ARM numbers look like? Before enabling by default, we should > make sure that the results are comparable there as well. > Hi Jim, I'll have a look. --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosbach at apple.com Mon Jul 29 13:07:57 2013 From: grosbach at apple.com (Jim Grosbach) Date: Mon, 29 Jul 2013 13:07:57 -0700 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> <1A62205A-6544-421E-B965-29C7779E877E@apple.com> <99897FE3-5C07-4702-8DE8-DF76DB551609@apple.com> Message-ID: Cool. Thanks! -Jim On Jul 29, 2013, at 1:07 PM, Renato Golin wrote: > On 29 July 2013 20:39, Jim Grosbach wrote: > These results are really excellent. They’re on Intel, I assume, right? What do the ARM numbers look like? Before enabling by default, we should make sure that the results are comparable there as well. > > Hi Jim, > > I'll have a look. > > --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Mon Jul 29 13:21:14 2013 From: renato.golin at linaro.org (Renato Golin) Date: Mon, 29 Jul 2013 21:21:14 +0100 Subject: [LLVMdev] Require Grammar for converting C to IR In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5AE58675B@USEA-EXCH8.na.uis.unisys.com> References: <99C8B2929B39C24493377AC7A121E21FC5AE58675B@USEA-EXCH8.na.uis.unisys.com> Message-ID: On 29 July 2013 20:41, Caldarale, Charles R wrote: > > Seems that Vijay is asking about converting C program to LLVM IR > > Which is exactly what clang does, hence the previous answer from Eli. > We seem to have a communication breakdown. My quick answer was also "look at Clang", though it seems they're asking for something different... As far as I know, there isn't any C/C++ EBNF -> LLVM IR parsers available, as there aren't crazy enough folks out there... ;) A few hints: * If you're looking for the EBNF description of C/C++, google for "C++ ebnf" and you'll find plenty. * If you need a compiler for C/C++, look at Clang, and you'll find a great implementation. * If you're looking for a tool that gets a C/C++ grammar and transforms into a parser (like bison), you may find examples at the Gnu Bison docs. You can than use that to convert your AST to LLVM IR, but you'll have to hand-code the bridge yourself. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From nlewycky at google.com Mon Jul 29 14:09:50 2013 From: nlewycky at google.com (Nick Lewycky) Date: Mon, 29 Jul 2013 14:09:50 -0700 Subject: [LLVMdev] LLVM Bay-area social, August! Message-ID: Hi everyone! It's August later this week. I know, I'm as surprised as anyone. Come celebrate the first of August with drinks, food and socializing LLVM developers! Once again the plan is to meet at Tied House, but this time on Thursday August 1st. More details and RSVP at http://llvmbayarea.appspot.com/ . Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.redmond at intel.com Mon Jul 29 14:40:09 2013 From: paul.redmond at intel.com (Redmond, Paul) Date: Mon, 29 Jul 2013 21:40:09 +0000 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: Message-ID: Hi, Several weeks ago a prototyped a feature similar to what you're describing. I was experimenting to see how one might implement a feature like ICC's -vec–report feature in clang/llvm. My approach was to create an ImmutablePass which stores notes. I modified the loop vectorizer and the unroll pass to add notes when loops were vectorized or unrolled. On the clang side I add an OptReport to the pass manager and dump out the notes as diagnostics. It worked ok as a prototype but getting the source locations correct was a bit fragile. I've attached some patches in case you're interested. Paul From: Quentin Colombet > Date: Tuesday, 16 July, 2013 8:21 PM To: LLVM Developers Mailing List > Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. Hi, I would like to start a discussion about error/warning reporting in LLVM and how we can extend the current mechanism to take advantage of clang capabilities. ** Motivation ** Currently LLVM provides a way to report error either directly (print to stderr) or by using a user defined error handler. For instance, in inline asm parsing, we can specify the diagnostic handler to report the errors in clang. The basic idea would be to be able to do that for warnings too (and for other kind of errors?). A motivating example can be found with the following link where we want LLVM to be able to warn on the stack size to help developing kernels: http://llvm.org/bugs/show_bug.cgi?id=4072 By adding this capability, we would be able to have access to all the nice features clang provides with warnings: - Promote it to an error. - Ignore it. ** Challenge ** To be able to take advantage of clang framework for warning/error reporting, warnings have to be associated with warning groups. Thus, we need a way for the backend to specify a front-end warning type. The challenge is, AFAICT (which is not much, I admit), that front-end warning types are statically handled using tablegen representation. ** Advices Needed ** 1. Decide whether or not we want such capabilities (if we do not we may just add sporadically the support for a new warning/group of warning/error). 2. Come up with a plan to implement that (assuming we want it). Thanks for the feedbacks. Cheers, -Quentin -------------- next part -------------- A non-text attachment was scrubbed... Name: opt_report_clang.diff Type: application/octet-stream Size: 4011 bytes Desc: opt_report_clang.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: opt_report_llvm.diff Type: application/octet-stream Size: 7219 bytes Desc: opt_report_llvm.diff URL: From brianherman at gmail.com Mon Jul 29 14:42:08 2013 From: brianherman at gmail.com (Brian Herman) Date: Mon, 29 Jul 2013 16:42:08 -0500 Subject: [LLVMdev] LLVM and Cygwin Message-ID: I got the following error while compiling llvm and clang under cygwin. /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o):SectionMemoryManager.cpp:(.text+0x3b): undefined reference to `__register_frame' /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o):SectionMemoryManager.cpp:(.text+0x3b): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__register_frame' /usr/lib/gcc/x86_64-pc-cygwin/4.8.1/../../../../x86_64-pc-cygwin/bin/ld: /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o): bad reloc address 0x0 in section `.pdata' collect2: error: ld returned 1 exit status /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:1530: recipe for target `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/bin/lli.exe' failed make[2]: *** [/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/bin/lli.exe] Error 1 make[2]: Leaving directory `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/tools/lli' /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:925: recipe for target `lli/.makeall' failed make[1]: *** [lli/.makeall] Error 2 make[1]: Leaving directory `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/tools' /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:876: recipe for target `all' failed make: *** [all] Error 1 I have no idea what that means. -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ofv at wanadoo.es Mon Jul 29 16:02:01 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Tue, 30 Jul 2013 01:02:01 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> Message-ID: <878v0phswm.fsf@wanadoo.es> Duncan Sands writes: > On 29/07/13 15:30, Anton Korobeynikov wrote: >>>> object in place on the stack or at least call its copy constructor. >>> >>> >>> what does GCC do? >> Nothing. It does not support MSVC ABI. > > Maybe we shouldn't either :) Right. What's the point of all the effort devoted to MSVC++ ABI compatibility when Clang doesn't need it for being a top-notch C++ compiler on Windows? From atrick at apple.com Mon Jul 29 16:07:30 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 29 Jul 2013 16:07:30 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F46A27.2000502@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51E74E72.5000902@gmail.com> <51F46A27.2000502@gmail.com> Message-ID: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> On Jul 27, 2013, at 5:47 PM, Shuxin Yang wrote: > Hi, Sean: > > I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change > to the IPO pass-ordering for now. However, when I make a minor change to > populateLTOPassManager() by separating module-pass and non-module-passes, I > saw quite a few performance difference, most of them are degradations. Attacking > these degradations one by one in a piecemeal manner is wasting time. We might as > well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time, > and hopefully once for all. > > In order to repair the image of being a liar, I post some preliminary result in this cozy > Saturday afternoon which I normally denote to daydreaming :-) > > So far I only measure the result of MultiSource benchmarks on my iMac (late > 2012 model), and the command to run the benchmark is > "make TEST=simple report OPTFLAGS='-O3 -flto'". > > In terms of execution-time, some degrade, but more improve, few of them > are quite substantial. User-time is used for comparison. I measure the > result twice, they are basically very stable. As far as I can tell from the result, > the proposed pass-ordering is basically toward good change. > > Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase > (see the patch) with original populateLTOPassManager() for both IPO and postIPO, > I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc" > (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance > to figure out why this combination improves this benchmark this much. > > In teams of compile-time, the result reports my change improve the compile > time by about 2x, which is non-sense. I guess test-script doesn't count > link-time. > > The new pass ordering Pre-IPO, IPO, and PostIPO are defined by > populate{PreIPO|IPO|PostIPO}PassMgr(). > > I will discuss with Andy next Monday in order to be consistent with the > pass-ordering design he is envisioning, and measure more benchmarks then > post the patch and result to the community for discussion and approval. > > Thanks > Shuxin I don't have any objection to this as long as your compile times are comparable. The major differences that I could spot are: You've moved the second iteration of some scalar opts into post-IPO: - JumpThreading - CorrelatedValueProp You no longer run InstCombine after the first round of scalar opts (in preIPO) and before the second round (in PostIPO). You now have an extra (3rd) SROA in PostIPO. I don't see a problem, but I'd like to understand the rationale. I think it would be valuable to capture some of the motivation behind the standard pass ordering and any changes we make to it. Sometimes part of the design becomes obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems? -Andy From Xiaoyi.Guo at amd.com Mon Jul 29 16:08:08 2013 From: Xiaoyi.Guo at amd.com (Guo, Xiaoyi) Date: Mon, 29 Jul 2013 23:08:08 +0000 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: Message-ID: Hi, We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the "indvars" pass only takes 45 seconds. It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. I realize that the expression grows to be really large towards the end of the loop. I don't know of all the uses of the built SCEV. But I image it won't be very useful for such complex expressions. Yet, it's making the compile time much longer. So I'm wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. Thanks in advance for any response. Xiaoyi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAD_test.ll Type: application/octet-stream Size: 11406 bytes Desc: MAD_test.ll URL: From dberlin at dberlin.org Mon Jul 29 16:18:05 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Mon, 29 Jul 2013 16:18:05 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: Message-ID: On Mon, Jul 29, 2013 at 4:08 PM, Guo, Xiaoyi wrote: > Hi,**** > > ** ** > > We have a benchmark where there are 128 MAD computations in a loop. (See > the attached IR.) Creating SCEVs for these expressions takes a long time, > making the compile time too long. E.g., running opt with the “indvars” pass > only takes 45 seconds.**** > > ** ** > > It seems that the majority of the time is spent in comparing the > complexity of the expression operands to sort them. > Why not just fix this then? I assume the issue is that they all end up with the same length/complexity, so it both falls into the N^2 loop in GroupByComplexity, and the sort itself takes a long time because it compares operand by operand. This seems easy to fix by having GroupByComplexity calculate a very cheap hash prior to sorting when number of operands is large, and then using that hash before recursively comparing element by element to distinguish the cases. (You could also create/update this hash as you go, but that seems like it would be more work) Unless of course, they really are all the same expression, in which case, this is harder :) > **** > > ** ** > > I realize that the expression grows to be really large towards the end of > the loop.**** > > ** ** > > I don’t know of all the uses of the built SCEV. But I image it won’t be > very useful for such complex expressions. Yet, it’s making the compile time > much longer.**** > > ** ** > > So I’m wondering if it would make sense to abort the creation of SCEV when > the expression gets really complex and large. Or is there any way to > further optimize the performance of SCEV building for such cases.**** > > ** ** > > Thanks in advance for any response.**** > > ** ** > > Xiaoyi**** > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From silvas at purdue.edu Mon Jul 29 16:18:54 2013 From: silvas at purdue.edu (Sean Silva) Date: Mon, 29 Jul 2013 16:18:54 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51E74E72.5000902@gmail.com> <51F46A27.2000502@gmail.com> <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> Message-ID: On Mon, Jul 29, 2013 at 4:07 PM, Andrew Trick wrote: > > I don't see a problem, but I'd like to understand the rationale. I think > it would be valuable to capture some of the motivation behind the standard > pass ordering and any changes we make to it. Sometimes part of the design > becomes obsolete but no one can be sure. Shall we start a new doc under > LLVM subsystems? > Starting a new doc sounds like a good idea to me. If you aren't familiar with adding to the Sphinx docs, the sphinx quickstart template will get you up and running . -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfinkel at anl.gov Mon Jul 29 16:24:19 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Mon, 29 Jul 2013 18:24:19 -0500 (CDT) Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> Message-ID: <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > > On Jul 27, 2013, at 5:47 PM, Shuxin Yang > wrote: > > > Hi, Sean: > > > > I'm sorry I lie. I didn't mean to lie. I did try to avoid making > > a *BIG* change > > to the IPO pass-ordering for now. However, when I make a minor > > change to > > populateLTOPassManager() by separating module-pass and > > non-module-passes, I > > saw quite a few performance difference, most of them are > > degradations. Attacking > > these degradations one by one in a piecemeal manner is wasting > > time. We might as > > well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases > > at this time, > > and hopefully once for all. > > > > In order to repair the image of being a liar, I post some > > preliminary result in this cozy > > Saturday afternoon which I normally denote to daydreaming :-) > > > > So far I only measure the result of MultiSource benchmarks on my > > iMac (late > > 2012 model), and the command to run the benchmark is > > "make TEST=simple report OPTFLAGS='-O3 -flto'". > > > > In terms of execution-time, some degrade, but more improve, few of > > them > > are quite substantial. User-time is used for comparison. I measure > > the > > result twice, they are basically very stable. As far as I can tell > > from the result, > > the proposed pass-ordering is basically toward good change. > > > > Interesting enough, if I combine the populatePreIPOPassMgr() as > > the preIPO phase > > (see the patch) with original populateLTOPassManager() for both IPO > > and postIPO, > > I see significant improve to > > "Benchmarks/Trimaran/netbench-crc/netbench-crc" > > (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I > > have not yet got chance > > to figure out why this combination improves this benchmark this > > much. > > > > In teams of compile-time, the result reports my change improve the > > compile > > time by about 2x, which is non-sense. I guess test-script doesn't > > count > > link-time. > > > > The new pass ordering Pre-IPO, IPO, and PostIPO are defined by > > populate{PreIPO|IPO|PostIPO}PassMgr(). > > > > I will discuss with Andy next Monday in order to be consistent > > with the > > pass-ordering design he is envisioning, and measure more benchmarks > > then > > post the patch and result to the community for discussion and > > approval. > > > > Thanks > > Shuxin > > I don't have any objection to this as long as your compile times are > comparable. > > The major differences that I could spot are: > > You've moved the second iteration of some scalar opts into post-IPO: > - JumpThreading > - CorrelatedValueProp > > You no longer run InstCombine after the first round of scalar opts > (in preIPO) and before the second round (in PostIPO). > > You now have an extra (3rd) SROA in PostIPO. > > I don't see a problem, but I'd like to understand the rationale. I > think it would be valuable to capture some of the motivation behind > the standard pass ordering and any changes we make to it. Sometimes > part of the design becomes obsolete but no one can be sure. Out of curiosity, has anyone tried to optimize the pass ordering in some (quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for this. -Hal > Shall we > start a new doc under LLVM subsystems? > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From atrick at apple.com Mon Jul 29 16:28:46 2013 From: atrick at apple.com (Andrew Trick) Date: Mon, 29 Jul 2013 16:28:46 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F692C1.2080901@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> Message-ID: <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> On Jul 29, 2013, at 9:05 AM, Krzysztof Parzyszek wrote: > On 7/16/2013 11:38 PM, Andrew Trick wrote: >> Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. >> >> To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. >> >> --- >> Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. >> >> An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. >> >> Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. >> >> IR Canonicalization Pipeline: >> >> Function Passes { >> SimplifyCFG >> SROA-1 >> EarlyCSE >> } >> Call-Graph SCC Passes { >> Inline >> Function Passes { >> EarlyCSE >> SimplifyCFG >> InstCombine >> Early Loop Opts { >> LoopSimplify >> Rotate (when obvious) >> Full-Unroll (when obvious) >> } >> SROA-2 >> InstCombine >> GVN >> Reassociate >> Generic Loop Opts { >> LICM (Rotate on-demand) >> Unswitch >> } >> SCCP >> InstCombine >> JumpThreading >> CorrelatedValuePropagation >> AggressiveDCE >> } >> } >> > > I'm a bit late to this, but the examples of the "generic loop opts" above are really better left until later. They have a potential to obscure the code and make other loop optimizations harder. Specifically, there has to be a place where loop nest optimizations can be done (such as loop interchange or unroll-and-jam). There is also array expansion and loop distribution, which can be highly target-dependent in terms of their applicability. I don't know if TTI could provide enough details to account for all circumstances that would motivate such transformations, but assuming that it could, there still needs to be a room left for it in the design. You mean that LICM and Unswitching should be left for later? For the purpose of exposing scalar optimizations, I'm not sure I agree with that but I'd be interested in examples. I think you're only worried about the impact on loop nest optimizations. Admittedly I'm not making much concessesion for that, because I think of loop nest optimization as a different tool that will probably want fairly substantial changes to the pass pipeline anyway. Here's a few of ways it might work: (1) Loop nest optimizer extends the standard PMB by plugging in its own passes prior to Generic Loop Opts in addition to loading TTI. The loop nest optimizer's passes are free to query TTI: (2) Loop nest optimizer suppresses generic loop opts through a PMB flag (assuming they are too disruptive). It registers its own loop passes with the Target Loop Opts. It registers instances of generic loop opts to now run after loop nest optimization, and registers new instances of scalar opts to rerun after Target Loop Opts if needed. (3) If the loop nest optimizer were part of llvm core libs, then we could have a completely separate passmanager builder for it. > On a different, but related note---one thing I've asked recently was about the "proper" solution for recognizing target specific loop idioms. On Hexagon, we have a builtin functions that handle certain specific loop patterns. In order to separate the target-dependent code from the target-independent, we would basically have to replicate the loop idiom recognition in our own target-specific pass. Not only that, but it would have to run before the loops may be subjected to other optimizations that could obfuscate the opportunity. To solve this, I was thinking about having target-specific hooks in the idiom recognition code, that could transform a given loop in the target's own way. Still, that would imply target-specific transformations running before the "official" lowering code. We may be able to run loop idiom recognition as part of Target Loop Opts. If that misses too many optimizations, then targets can add a second instance of loop-idiom in the target loop opts. Target's can also add extra instances of scalar opts passes in the lowering pipeline, if needed, to cleanup. The lowering pass order should be completely configurable. Are you afraid that LICM and unswitching will obfuscate the loops to the point that you can’t recognize the idiom? The current pass pipeline would have the same problem. -Andy From chisophugis at gmail.com Mon Jul 29 16:38:54 2013 From: chisophugis at gmail.com (Sean Silva) Date: Mon, 29 Jul 2013 16:38:54 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> References: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> Message-ID: On Mon, Jul 29, 2013 at 4:24 PM, Hal Finkel wrote: > > Out of curiosity, has anyone tried to optimize the pass ordering in some > (quasi-)automated way? Naively, a genetic algorithm seems like a perfect > fit for this. > This is the closest I've seen: http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/ However, it deals with a "toy" example. Doing something similar over an entire benchmark suite would be interesting (and it may find non-obvious, highly-profitable interactions between passes that we aren't currently exploiting). -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Mon Jul 29 16:39:12 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Mon, 29 Jul 2013 16:39:12 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51E74E72.5000902@gmail.com> <51F46A27.2000502@gmail.com> <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> Message-ID: <51F6FD20.8000305@gmail.com> On 7/29/13 4:07 PM, Andrew Trick wrote: > On Jul 27, 2013, at 5:47 PM, Shuxin Yang wrote: > >> Hi, Sean: >> >> I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change >> to the IPO pass-ordering for now. However, when I make a minor change to >> populateLTOPassManager() by separating module-pass and non-module-passes, I >> saw quite a few performance difference, most of them are degradations. Attacking >> these degradations one by one in a piecemeal manner is wasting time. We might as >> well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time, >> and hopefully once for all. >> >> In order to repair the image of being a liar, I post some preliminary result in this cozy >> Saturday afternoon which I normally denote to daydreaming :-) >> >> So far I only measure the result of MultiSource benchmarks on my iMac (late >> 2012 model), and the command to run the benchmark is >> "make TEST=simple report OPTFLAGS='-O3 -flto'". >> >> In terms of execution-time, some degrade, but more improve, few of them >> are quite substantial. User-time is used for comparison. I measure the >> result twice, they are basically very stable. As far as I can tell from the result, >> the proposed pass-ordering is basically toward good change. >> >> Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase >> (see the patch) with original populateLTOPassManager() for both IPO and postIPO, >> I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc" >> (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance >> to figure out why this combination improves this benchmark this much. >> >> In teams of compile-time, the result reports my change improve the compile >> time by about 2x, which is non-sense. I guess test-script doesn't count >> link-time. >> >> The new pass ordering Pre-IPO, IPO, and PostIPO are defined by >> populate{PreIPO|IPO|PostIPO}PassMgr(). >> >> I will discuss with Andy next Monday in order to be consistent with the >> pass-ordering design he is envisioning, and measure more benchmarks then >> post the patch and result to the community for discussion and approval. >> >> Thanks >> Shuxin > I don't have any objection to this as long as your compile times are comparable. > > The major differences that I could spot are: > > You've moved the second iteration of some scalar opts into post-IPO: > - JumpThreading > - CorrelatedValueProp I don't see why we need so many iterations. So, I get rid of it > > You no longer run InstCombine after the first round of scalar opts (in preIPO) and before the second round (in PostIPO). > > You now have an extra (3rd) SROA in PostIPO. I call the SROA for dead code elimination, seriously! The dead-whatever-elimination (even if they are called aggressive) pass dose not eliminate last store the local variable. Shame! Shame! Shame! It seems we don't have better way since we don't like mem-ssa. We have to call SROA , a all-in-one algorithm, to perform such stuff. > > I don't see a problem, but I'd like to understand the rationale. I think it would be valuable to capture some of the motivation behind the standard pass ordering and any changes we make to it. Sometimes part of the design becomes obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems? > > -Andy From chisophugis at gmail.com Mon Jul 29 16:41:44 2013 From: chisophugis at gmail.com (Sean Silva) Date: Mon, 29 Jul 2013 16:41:44 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: References: <0925EAF0-DC58-4EF5-BF99-447CC5552E77@apple.com> <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> Message-ID: On Mon, Jul 29, 2013 at 4:38 PM, Sean Silva wrote: > > > > On Mon, Jul 29, 2013 at 4:24 PM, Hal Finkel wrote: >> >> Out of curiosity, has anyone tried to optimize the pass ordering in some >> (quasi-)automated way? Naively, a genetic algorithm seems like a perfect >> fit for this. >> > > This is the closest I've seen: > http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/ > > > However, it deals with a "toy" example. Doing something similar over an > entire benchmark suite would be interesting (and it may find non-obvious, > highly-profitable interactions between passes that we aren't currently > exploiting). > > One more I saw recently: http://gcc.gnu.org/wiki/cauldron2013?action=AttachFile&do=get&target=machine_guided_energy_energy_efficient_compilation.pdf (it deals with LLVM and GCC, but focuses on "energy efficiency" and not raw performance). -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Mon Jul 29 16:47:51 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Mon, 29 Jul 2013 16:47:51 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> References: <189556652.15398644.1375140259285.JavaMail.root@alcf.anl.gov> Message-ID: <51F6FF27.9010705@gmail.com> I personally strong abhor this kind of thing:-) I guess I should be more open-minded. For pre-ipo phase, some passes should not invoke, say, any loop nest-opt, loop version, aggressive loop unrolling, vectorization, aggressive inling. The reasons are they will hinder the downstream optimizers if they kick in early. > Out of curiosity, has anyone tried to optimize the pass ordering in some (quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for this. > > -Hal > >> Shall we >> start a new doc under LLVM subsystems? >> >> -Andy >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> From hfinkel at anl.gov Mon Jul 29 20:43:35 2013 From: hfinkel at anl.gov (Hal Finkel) Date: Mon, 29 Jul 2013 22:43:35 -0500 (CDT) Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F6FF27.9010705@gmail.com> Message-ID: <1227639731.15422607.1375155815177.JavaMail.root@alcf.anl.gov> ----- Original Message ----- > I personally strong abhor this kind of thing:-) I guess I should be > more > open-minded. In my opinion, there are really two different kinds of uses for automated optimization: 1. Fundamental design questions (such as the overall structure of the pass schedule). For these, using an optimization algorithm can be interesting, but mostly just to make sure that things are working as we expect. 2. Purely empirical questions (such as exactly where to run instcombine). For these questions the overall design provides a range of theoretically-equally-good configurations, and picking an optimum based on compile-time-vs-code-performance for different optimization levels (for some particular set of target applications) is a perfectly justifiable use of autotuning. -Hal > > For pre-ipo phase, some passes should not invoke, say, any loop > nest-opt, loop version, aggressive loop unrolling, > vectorization, aggressive inling. > > The reasons are they will hinder the downstream optimizers if they > kick > in early. > > > Out of curiosity, has anyone tried to optimize the pass ordering in > > some (quasi-)automated way? Naively, a genetic algorithm seems > > like a perfect fit for this. > > > > -Hal > > > >> Shall we > >> start a new doc under LLVM subsystems? > >> > >> -Andy > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory From Xiaoyi.Guo at amd.com Mon Jul 29 20:48:16 2013 From: Xiaoyi.Guo at amd.com (Guo, Xiaoyi) Date: Tue, 30 Jul 2013 03:48:16 +0000 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: Message-ID: Thank you very much for your reply. Do you mean calculate the hash based on element SCEV pointers? But according to the comments before GroupByComplexity(): /// Note that we go take special precautions to ensure that we get deterministic /// results from this routine. In other words, we don't want the results of /// this to depend on where the addresses of various SCEV objects happened to /// land in memory. Xiaoyi From: Daniel Berlin [mailto:dberlin at dberlin.org] Sent: Monday, July 29, 2013 4:18 PM To: Guo, Xiaoyi Cc: LLVMdev at cs.uiuc.edu Subject: Re: [LLVMdev] creating SCEV taking too long On Mon, Jul 29, 2013 at 4:08 PM, Guo, Xiaoyi > wrote: Hi, We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the "indvars" pass only takes 45 seconds. It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. Why not just fix this then? I assume the issue is that they all end up with the same length/complexity, so it both falls into the N^2 loop in GroupByComplexity, and the sort itself takes a long time because it compares operand by operand. This seems easy to fix by having GroupByComplexity calculate a very cheap hash prior to sorting when number of operands is large, and then using that hash before recursively comparing element by element to distinguish the cases. (You could also create/update this hash as you go, but that seems like it would be more work) Unless of course, they really are all the same expression, in which case, this is harder :) I realize that the expression grows to be really large towards the end of the loop. I don't know of all the uses of the built SCEV. But I image it won't be very useful for such complex expressions. Yet, it's making the compile time much longer. So I'm wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. Thanks in advance for any response. Xiaoyi _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From pli at cs.rochester.edu Mon Jul 29 21:55:54 2013 From: pli at cs.rochester.edu (lipengcheng) Date: Mon, 29 Jul 2013 21:55:54 -0700 Subject: [LLVMdev] Questions about BB number changes Message-ID: Hi, All, Thanks for your answer ahead of time. I plug in an new pass into LLVM to insert one line of code to each BB for printing current BB number. I insert this pass in the Instrumentation phase. But after code generation, I found the total number of BB is different with the number shown in Assembly code. For example, when I traverse all BBs during the instrumentation phase, there are 30000. But after code generation, there are 30100 BBs. For small scale of programs, there is no such issue. Can you tell me which of the succeeding phases changes the BB number ? How can I deal with this ? Thanks, Best, Pengcheng From arphaman at gmail.com Tue Jul 30 01:15:52 2013 From: arphaman at gmail.com (Alex L) Date: Tue, 30 Jul 2013 09:15:52 +0100 Subject: [LLVMdev] [GSoC] flang midterm progress report Message-ID: Hi everyone! My flang GSoC has been going great so far, and I have achieved a lot. I wrote a report describing some of the results and instructions on how to use flang: http://flang-gsoc.blogspot.ie/2013/07/gsoc-midterm-progress-report.html Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasha.sala7 at gmail.com Tue Jul 30 05:44:27 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Tue, 30 Jul 2013 14:44:27 +0200 Subject: [LLVMdev] Instruction insertion By Module Pass Message-ID: Hi, I need to insert new instruction into every basic block like x=1 or while loop I tried this code, but it doesn't work Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); AllocaInst* newInst = new AllocaInst(Int32Type,"flag", Bb); Bb->getInstList().push_back(newInst); the error: void llvm::SymbolTableListTraits::addNodeToList(ValueSubClass *) [ValueSubClass = llvm::Instruction, ItemParentClass = llvm::BasicBlock]: Assertion `V->getParent() == 0 && "Value already in a container!!"' failed. Is there a class I could use to insert while loop in Module Pass? Thank you in advance -- * Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University* * e-mail: rasha.omar at ejust.edu.eg* P* Please consider the environment before printing this email.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Tue Jul 30 05:51:23 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 30 Jul 2013 14:51:23 +0200 Subject: [LLVMdev] LLVM and Cygwin In-Reply-To: References: Message-ID: <51F7B6CB.80603@free.fr> Hi Brian, On 29/07/13 23:42, Brian Herman wrote: > I got the following error while compiling llvm and clang under cygwin. > > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o):SectionMemoryManager.cpp:(.text+0x3b): > undefined reference to `__register_frame' I register_frame is used to enable the debugger (gdb) to debug JIT'd code. It is a function provided by libgcc, to be more precise in libgcc_eh. Is it in your copy? $ nm libgcc_eh.a | grep register_fram 0000000000001960 T __deregister_frame 0000000000001950 T __deregister_frame_info 0000000000001830 T __deregister_frame_info_bases 0000000000001750 T __register_frame 0000000000001740 T __register_frame_info 00000000000016b0 T __register_frame_info_bases 0000000000001800 T __register_frame_info_table 0000000000001780 T __register_frame_info_table_bases 0000000000001810 T __register_frame_table Ciao, Duncan. > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o):SectionMemoryManager.cpp:(.text+0x3b): > relocation truncated to fit: R_X86_64_PC32 against undefined symbol > `__register_frame' > /usr/lib/gcc/x86_64-pc-cygwin/4.8.1/../../../../x86_64-pc-cygwin/bin/ld: > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/lib/libLLVMMCJIT.a(SectionMemoryManager.o): > bad reloc address 0x0 in section `.pdata' > collect2: error: ld returned 1 exit status > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:1530: > recipe for target > `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/bin/lli.exe' > failed > make[2]: *** > [/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Release+Asserts/bin/lli.exe] > Error 1 > make[2]: Leaving directory > `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/tools/lli' > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:925: > recipe for target `lli/.makeall' failed > make[1]: *** [lli/.makeall] Error 2 > make[1]: Leaving directory > `/cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/tools' > /cygdrive/c/Users/brianherman/Desktop/llvm/llvm-3.3.src/Makefile.rules:876: > recipe for target `all' failed > make: *** [all] Error 1 > I have no idea what that means. > > -- > > > Thanks, > Brian Herman > college.nfshost.com > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From brianherman at gmail.com Tue Jul 30 06:01:21 2013 From: brianherman at gmail.com (Brian Herman) Date: Tue, 30 Jul 2013 08:01:21 -0500 Subject: [LLVMdev] LLVM and Cygwin In-Reply-To: <51F7B6CB.80603@free.fr> References: <51F7B6CB.80603@free.fr> Message-ID: I get this when I type: brianherman at windows-8-[REDACTED] ~ $ nm libgcc_eh.a | grep register_frame nm: 'libgcc_eh.a': No such file brianherman at windows-8-[REDACTED] ~ $ nm libgcc_eh.a | grep register_fram nm: 'libgcc_eh.a': No such file brianherman at windows-8-[REDACTED] ~ $ On Tue, Jul 30, 2013 at 7:51 AM, Duncan Sands wrote: > Hi Brian, > > > On 29/07/13 23:42, Brian Herman wrote: > >> I got the following error while compiling llvm and clang under cygwin. >> >> /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Release+Asserts/lib/**libLLVMMCJIT.a(**SectionMemoryManager.o):** >> SectionMemoryManager.cpp:(.**text+0x3b): >> undefined reference to `__register_frame' >> > > I register_frame is used to enable the debugger (gdb) to debug JIT'd code. > It > is a function provided by libgcc, to be more precise in libgcc_eh. Is it > in > your copy? > > $ nm libgcc_eh.a | grep register_fram > 0000000000001960 T __deregister_frame > 0000000000001950 T __deregister_frame_info > 0000000000001830 T __deregister_frame_info_bases > 0000000000001750 T __register_frame > 0000000000001740 T __register_frame_info > 00000000000016b0 T __register_frame_info_bases > 0000000000001800 T __register_frame_info_table > 0000000000001780 T __register_frame_info_table_**bases > 0000000000001810 T __register_frame_table > > Ciao, Duncan. > > /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Release+Asserts/lib/**libLLVMMCJIT.a(**SectionMemoryManager.o):** >> SectionMemoryManager.cpp:(.**text+0x3b): >> relocation truncated to fit: R_X86_64_PC32 against undefined symbol >> `__register_frame' >> /usr/lib/gcc/x86_64-pc-cygwin/**4.8.1/../../../../x86_64-pc-** >> cygwin/bin/ld: >> /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Release+Asserts/lib/**libLLVMMCJIT.a(**SectionMemoryManager.o): >> bad reloc address 0x0 in section `.pdata' >> collect2: error: ld returned 1 exit status >> /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Makefile.rules:1530: >> recipe for target >> `/cygdrive/c/Users/**brianherman/Desktop/llvm/llvm-** >> 3.3.src/Release+Asserts/bin/**lli.exe' >> failed >> make[2]: *** >> [/cygdrive/c/Users/**brianherman/Desktop/llvm/llvm-** >> 3.3.src/Release+Asserts/bin/**lli.exe] >> Error 1 >> make[2]: Leaving directory >> `/cygdrive/c/Users/**brianherman/Desktop/llvm/llvm-**3.3.src/tools/lli' >> /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Makefile.rules:925: >> recipe for target `lli/.makeall' failed >> make[1]: *** [lli/.makeall] Error 2 >> make[1]: Leaving directory >> `/cygdrive/c/Users/**brianherman/Desktop/llvm/llvm-**3.3.src/tools' >> /cygdrive/c/Users/brianherman/**Desktop/llvm/llvm-3.3.src/** >> Makefile.rules:876: >> recipe for target `all' failed >> make: *** [all] Error 1 >> I have no idea what that means. >> >> -- >> >> >> Thanks, >> Brian Herman >> college.nfshost.com >> >> >> >> >> >> >> ______________________________**_________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev >> >> > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Tue Jul 30 06:10:18 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Tue, 30 Jul 2013 14:10:18 +0100 Subject: [LLVMdev] LLVM and Cygwin In-Reply-To: References: <51F7B6CB.80603@free.fr> Message-ID: > $ nm libgcc_eh.a | grep register_frame > nm: 'libgcc_eh.a': No such file I think he meant to find out where libgcc_eh.a lives under Cygwin and execute the command on that file. It should be somewhere amongst the stuff installed with gcc, but the exact location can vary quite a bit. Cheers. Tim. From brianherman at gmail.com Tue Jul 30 06:19:28 2013 From: brianherman at gmail.com (Brian Herman) Date: Tue, 30 Jul 2013 08:19:28 -0500 Subject: [LLVMdev] LLVM and Cygwin In-Reply-To: References: <51F7B6CB.80603@free.fr> Message-ID: brianherman at windows-8-doesn't rock /lib/gcc/i686-pc-cygwin/4.7.3 $ nm libgcc_eh.a | grep register_frame 000011b0 T ___deregister_frame 000011a0 T ___deregister_frame_info 000010d0 T ___deregister_frame_info_bases 00000fe0 T ___register_frame 00000fb0 T ___register_frame_info 00000f40 T ___register_frame_info_bases 00001070 T ___register_frame_info_table 00001010 T ___register_frame_info_table_bases 000010a0 T ___register_frame_table On Tue, Jul 30, 2013 at 8:10 AM, Tim Northover wrote: > > $ nm libgcc_eh.a | grep register_frame > > nm: 'libgcc_eh.a': No such file > > I think he meant to find out where libgcc_eh.a lives under Cygwin and > execute the command on that file. It should be somewhere amongst the > stuff installed with gcc, but the exact location can vary quite a bit. > > Cheers. > > Tim. > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brianherman at gmail.com Tue Jul 30 06:28:38 2013 From: brianherman at gmail.com (Brian Herman) Date: Tue, 30 Jul 2013 08:28:38 -0500 Subject: [LLVMdev] LLVM and Cygwin In-Reply-To: References: <51F7B6CB.80603@free.fr> Message-ID: Could it be an issue with my path? On Tue, Jul 30, 2013 at 8:19 AM, Brian Herman wrote: > brianherman at windows-8-doesn't rock /lib/gcc/i686-pc-cygwin/4.7.3 > $ nm libgcc_eh.a | grep register_frame > 000011b0 T ___deregister_frame > 000011a0 T ___deregister_frame_info > 000010d0 T ___deregister_frame_info_bases > 00000fe0 T ___register_frame > 00000fb0 T ___register_frame_info > 00000f40 T ___register_frame_info_bases > 00001070 T ___register_frame_info_table > 00001010 T ___register_frame_info_table_bases > 000010a0 T ___register_frame_table > > > On Tue, Jul 30, 2013 at 8:10 AM, Tim Northover wrote: > >> > $ nm libgcc_eh.a | grep register_frame >> > nm: 'libgcc_eh.a': No such file >> >> I think he meant to find out where libgcc_eh.a lives under Cygwin and >> execute the command on that file. It should be somewhere amongst the >> stuff installed with gcc, but the exact location can vary quite a bit. >> >> Cheers. >> >> Tim. >> > > > > -- > > > Thanks, > Brian Herman > college.nfshost.com > > > > > -- Thanks, Brian Herman college.nfshost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From criswell at illinois.edu Tue Jul 30 07:01:21 2013 From: criswell at illinois.edu (John Criswell) Date: Tue, 30 Jul 2013 09:01:21 -0500 Subject: [LLVMdev] Instruction insertion By Module Pass In-Reply-To: References: Message-ID: <51F7C731.1040404@illinois.edu> On 7/30/13 7:44 AM, Rasha Omar wrote: > Hi, > I need to insert new instruction into every basic block like x=1 > or while loop > I tried this code, but it doesn't work > > Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); > AllocaInst* newInst = new AllocaInst(Int32Type,"flag", Bb); > Bb->getInstList().push_back(newInst); The problem is that you've inserted the AllocaInst into the basic block via the AllocaInst constructor (note the Bb at the end of the line with new AllocaInst). You then attempt to insert the AllocaInst into the BasicBlock Bb a second time with the last line. Note that the assertion is telling you that you're inserting the alloca instruction twice. Remove the last line, and it should fix your problem. -- John T. > > the error: > void llvm::SymbolTableListTraits llvm::BasicBlock>::addNodeToList(ValueSubClass *) [ValueSubClass = > llvm::Instruction, ItemParentClass = llvm::BasicBlock]: Assertion > `V->getParent() == 0 && "Value already in a container!!"' failed. > > Is there a class I could use to insert while loop in Module Pass? > > Thank you in advance > > -- > *Rasha Salah Omar > Msc Student at E-JUST > Demonestrator at Faculty of Computers and Informatics > Benha University* > > *e-mail: rasha.omar at ejust.edu.eg * > > P* Please consider the environment before printing this email.* > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From criswell at illinois.edu Tue Jul 30 07:12:29 2013 From: criswell at illinois.edu (John Criswell) Date: Tue, 30 Jul 2013 09:12:29 -0500 Subject: [LLVMdev] Eliminating PHI with Identical Inputs Message-ID: <51F7C9CD.6070307@illinois.edu> Dear All, Is there a pass (or set of passes) that will replace a phi whose input operands all compute the same value with an instruction that computes that value? In other words, something that will convert: define internal i32 @function(i32 %x) { ... bb1: %y = add %x, 10 ... bb2: %z = add %x, 10 ... bb3: %phi = [bb1, %y], [bb2, %z] into define internal i32 @function(i32 %x) { ... bb1: ... bb2: ... bb3: %phi = add %x, 10 -- John T. From kparzysz at codeaurora.org Tue Jul 30 07:35:40 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Tue, 30 Jul 2013 09:35:40 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> Message-ID: <51F7CF3C.9060603@codeaurora.org> On 7/29/2013 6:28 PM, Andrew Trick wrote: > > You mean that LICM and Unswitching should be left for later? For the purpose of exposing scalar optimizations, I'm not sure I agree with that but I'd be interested in examples. Optimizations like LICM, and unswitching can potentially damage perfect nesting of loops. For example, consider this nest: for (i) { for (j) { ... = A[i]; } } The load of A[i] is invariant in the j-loop (assuming no aliased stores), and even if it was not invariant, the address computation code is. A pass of LICM could then generate something like this: for (i) { t = A[i]; for (j) { ... = t; } } This is no longer a perfect nest, and a variety of loop nest optimizations will no longer be applicable. In general, nest optimizations have a greater potential for improving code performance than scalar optimizations, and should be given priority over them. In most cases (excluding loop interchange, for example), the LICM opportunity will remain, and can be taken care of later. An example of where the target-dependent factors come into the picture before the target-specific stage (target-specific optimizations, lowering, etc.), may be loop distribution. The example below does not belong to the nest optimizations, but the generic target-independent scalar optimizations will still apply after this transformation. Say that we have a loop that performs several computations, some of which may be amenable to more aggressive optimizations: for (i) { v = 1.0/(1.0 + A[i]*A[i]); B[i] = (1.0 - B[i])*v } Suppose that we have a hardware that can perform very fast FMA operations (possibly vectorized). We could transform the loop into for (i) { t = 1.0 + A[i]*A[i]; v = 1.0/t; B[i] = (1.0 - B[i])*v; } And then expand t into an array, and distribute the loop: for (i) { t[i] = 1.0 + A[i]*A[i]; } for (i) { v = 1.0/t[i]; B[i] = (1.0 - B[i])*v; } The first loop may then be unrolled and vectorized, while the second one may simply be left alone. This may be difficult to address in a target-independent way (via a generic TTI interface), since we are trying to identify a specific pattern that, for the specific hardware, may be worth extracting out of a more complex loop. In addition to that, for this example to make sense, the vectorization passes would have to run afterwards, which would then put a bound on how late this transformation could be done. I guess the approach of being able to extend/modify what the PMB creates, that you mention below, would address this problem. > I think you're only worried about the impact on loop nest optimizations. Admittedly I'm not making much concessesion for that, because I think of loop nest optimization as a different tool that will probably want fairly substantial changes to the pass pipeline anyway. Yes, I agree. My major concern is that we need to be careful that we don't accidentally make things harder for ourselves. I very much agree with the canonicalization approach, and loop nests can take a bit of preparation (even including loop distribution). At the same time, there are other optimizations that will destroy the canonical structures (of loops, loop nests, etc.), that also need to take place at some point. There should be a place in the optimization sequence, where the "destructive" optimizations have not yet taken place, and where the "constructive" (canonical) passes can be executed. The farther down the optimization stream we push the "destructive" ones, the more flexibility we will have as to what types of transformations can be done that require the code to be in a canonical form. > Here's a few of ways it might work: > (1) Loop nest optimizer extends the standard PMB by plugging in its own passes prior to Generic Loop Opts in addition to loading TTI. The loop nest optimizer's passes are free to query TTI: > > (2) Loop nest optimizer suppresses generic loop opts through a PMB flag (assuming they are too disruptive). It registers its own loop passes with the Target Loop Opts. It registers instances of generic loop opts to now run after loop nest optimization, and registers new instances of scalar opts to rerun after Target Loop Opts if needed. > > (3) If the loop nest optimizer were part of llvm core libs, then we could have a completely separate passmanager builder for it. All of these approaches would work (even concurrently). I think (3) could potentially be the future goal. > Are you afraid that LICM and unswitching will obfuscate the loops to the point that you can’t recognize the idiom? The current pass pipeline would have the same problem. Actually, in this case my concern was the interleaving of the target-dependent code with target-independent code (if we were to do all idiom recognition in the same pass), or code duplication (if the target-independent and target-dependent passes were to be separate). The more I think about it, the more I favor the "separate" approach, since the target-specific idioms may be very different for different targets, and there doesn't seem to be much that can be handled in a common code (hence not a lot of actual duplication would happen). -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From baldrick at free.fr Tue Jul 30 07:46:15 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 30 Jul 2013 16:46:15 +0200 Subject: [LLVMdev] Eliminating PHI with Identical Inputs In-Reply-To: <51F7C9CD.6070307@illinois.edu> References: <51F7C9CD.6070307@illinois.edu> Message-ID: <51F7D1B7.5080904@free.fr> Hi John, On 30/07/13 16:12, John Criswell wrote: > Dear All, > > Is there a pass (or set of passes) that will replace a phi whose input operands > all compute the same value with an instruction that computes that value? In > other words, something that will convert: > > define internal i32 @function(i32 %x) { > ... > bb1: > %y = add %x, 10 > ... > bb2: > %z = add %x, 10 > ... > bb3: > %phi = [bb1, %y], [bb2, %z] > > into > > define internal i32 @function(i32 %x) { > ... > bb1: > ... > bb2: > ... > bb3: > %phi = add %x, 10 yes, GVN should replace %y and %z with the same register, giving something like %same = add %x, 10 ... bb1: ... bb2: ... bb3: %phi = [bb1, %same], [bb2, %same] at which point the instcombine pass should zap the phi, though maybe GVN will get it already. Ciao, Duncan. From Milind.Chabbi at rice.edu Tue Jul 30 08:22:29 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Tue, 30 Jul 2013 08:22:29 -0700 Subject: [LLVMdev] LLVM (opt) -profile-verifier is not pass resilient Message-ID: I compiled SPEC CPU2006 bzip2 with Clang, and generated profiles with OPT's -insert-optimal-edge-profiling option. After a profile run, I launched OPT with -profile-loader -profile-verifier flags and also passed -O3 flag. This caused OPT to give a warning "WARNING: profile information is inconsistent with the current program!" and then fail with an assert (ASSERT:inWeight and outWeight do not match opt: ProfileVerifierPass.cpp:226: void::ProfileVerifierPassT::CheckValue(bool, const char*, ::ProfileVerifierPassT::DetailedBlockInfo*) [with FType = llvm::Function, BType = llvm::BasicBlock]: Assertion `0 && (Message)' failed.) Instead of passing -O3, if I pass only one optimization pass with the profile, then there is no warning/assert in the profile verification. Is this because different passes of -O3 are modifying the CGF that makes it inconsistent with the originally produced profile? Is this expected behavior? Can't we make profiles (and profile verification) resilient to transformations ? -Milind From criswell at illinois.edu Tue Jul 30 08:23:13 2013 From: criswell at illinois.edu (John Criswell) Date: Tue, 30 Jul 2013 10:23:13 -0500 Subject: [LLVMdev] Eliminating PHI with Identical Inputs In-Reply-To: <51F7D1B7.5080904@free.fr> References: <51F7C9CD.6070307@illinois.edu> <51F7D1B7.5080904@free.fr> Message-ID: <51F7DA61.2000605@illinois.edu> On 7/30/13 9:46 AM, Duncan Sands wrote: > Hi John, > > On 30/07/13 16:12, John Criswell wrote: >> Dear All, >> >> Is there a pass (or set of passes) that will replace a phi whose >> input operands >> all compute the same value with an instruction that computes that >> value? In >> other words, something that will convert: >> >> define internal i32 @function(i32 %x) { >> ... >> bb1: >> %y = add %x, 10 >> ... >> bb2: >> %z = add %x, 10 >> ... >> bb3: >> %phi = [bb1, %y], [bb2, %z] >> >> into >> >> define internal i32 @function(i32 %x) { >> ... >> bb1: >> ... >> bb2: >> ... >> bb3: >> %phi = add %x, 10 > > yes, GVN should replace %y and %z with the same register, giving > something like > > %same = add %x, 10 > ... > bb1: > ... > bb2: > ... > bb3: > %phi = [bb1, %same], [bb2, %same] > > at which point the instcombine pass should zap the phi, though maybe > GVN will > get it already. Odd. I'm running GVN and then InstCombine, and it doesn't fix the problem. Is this new behavior added since LLVM 3.2, or is this something that GVN has been doing for awhile? -- John T. > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From baldrick at free.fr Tue Jul 30 08:28:56 2013 From: baldrick at free.fr (Duncan Sands) Date: Tue, 30 Jul 2013 17:28:56 +0200 Subject: [LLVMdev] Eliminating PHI with Identical Inputs In-Reply-To: <51F7DA61.2000605@illinois.edu> References: <51F7C9CD.6070307@illinois.edu> <51F7D1B7.5080904@free.fr> <51F7DA61.2000605@illinois.edu> Message-ID: <51F7DBB8.5000505@free.fr> Hi John, On 30/07/13 17:23, John Criswell wrote: > On 7/30/13 9:46 AM, Duncan Sands wrote: >> Hi John, >> >> On 30/07/13 16:12, John Criswell wrote: >>> Dear All, >>> >>> Is there a pass (or set of passes) that will replace a phi whose input operands >>> all compute the same value with an instruction that computes that value? In >>> other words, something that will convert: >>> >>> define internal i32 @function(i32 %x) { >>> ... >>> bb1: >>> %y = add %x, 10 >>> ... >>> bb2: >>> %z = add %x, 10 >>> ... >>> bb3: >>> %phi = [bb1, %y], [bb2, %z] >>> >>> into >>> >>> define internal i32 @function(i32 %x) { >>> ... >>> bb1: >>> ... >>> bb2: >>> ... >>> bb3: >>> %phi = add %x, 10 >> >> yes, GVN should replace %y and %z with the same register, giving something like >> >> %same = add %x, 10 >> ... >> bb1: >> ... >> bb2: >> ... >> bb3: >> %phi = [bb1, %same], [bb2, %same] >> >> at which point the instcombine pass should zap the phi, though maybe GVN will >> get it already. > > Odd. I'm running GVN and then InstCombine, and it doesn't fix the problem. > > Is this new behavior added since LLVM 3.2, or is this something that GVN has > been doing for awhile? I think it's been doing it forever. If it isn't doing it, then maybe I'm wrong about GVN doing this, but it could also be due to the topology of the CFG. After all, it has to put %same somewhere. If there isn't a good basic block to drop it in, then it may just give up. Can you provide a complete example that shows this problem? Ciao, Duncan. > > -- John T. > >> >> Ciao, Duncan. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From Jeroen.Dobbelaere at synopsys.com Tue Jul 30 08:26:53 2013 From: Jeroen.Dobbelaere at synopsys.com (Jeroen Dobbelaere) Date: Tue, 30 Jul 2013 15:26:53 +0000 Subject: [LLVMdev] using f32 in a 64bit integer only architecture Message-ID: <19541D745B8D4845AA4E0B97499258E4553BD14A@DE02WEMBXB.internal.synopsys.com> Hi, I am working on a 64bit architecture where only 'i64' is valid (no hardware floating point support) and I am triggering a 'Promote may not follow Expand or Promote' assertion failure. (TargetLowering.h : getTypeConversion) When I look into it, I see that the conversion fails because llvm tries to convert a 'f32' into a 'i32' through a TypeSoftenFloat. As i32 needs promotion to i64, this assertion is triggered. The other way around: a 32bit architecture (only i32 is valid) with doubles (f64) seems to work just fine. So, it seems that a 'TypeSoftenFloat' followed by an 'Expand' is valid (f64->i64->i32) Note: - this is for llvm-3.3 - In TargetLoweringBase.cpp , I find that (when f32 and f64 are not valid): -- the default action for 'f64' is to convert it into a 'i64' -- the default action for 'f32' is to convert it into a 'i32' Is this kind of support easy to add ? What would be the places that need to be updated ? (I tried with adding a direct promotion from f32 into i64 in TargetLoweringBase.cpp, but that seems to trigger other issues, so I am not sure if that is the right way to go) Greetings, Jeroen Dobbelaere From micah.villmow at smachines.com Tue Jul 30 08:50:23 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Tue, 30 Jul 2013 15:50:23 +0000 Subject: [LLVMdev] using f32 in a 64bit integer only architecture In-Reply-To: <19541D745B8D4845AA4E0B97499258E4553BD14A@DE02WEMBXB.internal.synopsys.com> References: <19541D745B8D4845AA4E0B97499258E4553BD14A@DE02WEMBXB.internal.synopsys.com> Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE6007497C0C@smi-exchange1.smi.local> Jeroen, This most likely is the case that no one has ran into this situation before and there was no need to implement this code path. I ran into this quite often when working on the GPU backend. Micah > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Jeroen Dobbelaere > Sent: Tuesday, July 30, 2013 8:27 AM > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] using f32 in a 64bit integer only architecture > > Hi, > > I am working on a 64bit architecture where only 'i64' is valid (no hardware > floating point support) and I am triggering a 'Promote may not follow Expand > or Promote' assertion failure. > (TargetLowering.h : getTypeConversion) > > When I look into it, I see that the conversion fails because llvm tries to > convert a 'f32' > into a 'i32' through a TypeSoftenFloat. > As i32 needs promotion to i64, this assertion is triggered. > > The other way around: a 32bit architecture (only i32 is valid) with doubles > (f64) seems to work just fine. So, it seems that a 'TypeSoftenFloat' followed > by an 'Expand' is valid (f64->i64->i32) > > Note: > - this is for llvm-3.3 > - In TargetLoweringBase.cpp , I find that (when f32 and f64 are not valid): > -- the default action for 'f64' is to convert it into a 'i64' > -- the default action for 'f32' is to convert it into a 'i32' > > Is this kind of support easy to add ? What would be the places that need to > be updated ? > (I tried with adding a direct promotion from f32 into i64 in > TargetLoweringBase.cpp, but that seems to trigger other issues, so I am not > sure if that is the right way to go) > > Greetings, > > Jeroen Dobbelaere > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From tanmx_star at yeah.net Tue Jul 30 10:03:11 2013 From: tanmx_star at yeah.net (Star Tan) Date: Wed, 31 Jul 2013 01:03:11 +0800 (CST) Subject: [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite Message-ID: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> Hi Tobias and all Polly developers, I have re-evaluated the Polly compile-time performance using newest LLVM/Polly source code. You can view the results on http://188.40.87.11:8000. Especially, I also evaluated our r187102 patch file that avoids expensive failure string operations in normal execution. Specifically, I evaluated two cases for it: Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median The "Polly-NoCodeGen" case is mainly used to compare the compile-time performance for the polly-detect pass. As shown in the results, our patch file could significantly reduce the compile-time overhead for some benchmarks such as tramp3dv4 (24.2%), simple_types_constant_folding(12.6%), oggenc(9.1%), loop_unroll(7.8%) The "Polly-opt" case is used to compare the whole compile-time performance of Polly. Since our patch file mainly affects the Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As shown in results, it reduces the compile-time overhead of some benchmarks such as tramp3dv4 (23.7%), simple_types_constant_folding(12.9%), oggenc(8.3%), loop_unroll(7.5%) At last, I also evaluated the performance of the ScopBottomUp patch that changes the up-down scop detection into bottom-up scop detection. Results can be viewed by: pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median (*Both of these results are based on LLVM r187116, which has included the r187102 patch file that we discussed above) Please notice that this patch file will lead to some errors in Polly-tests, so the data shown here can not be regards as confident results. For example, this patch can significantly reduce the compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop only because it regards the nested loop as an invalid scop and skips all following transformations and optimizations. However, I evaluated it here to see its potential performance impact. Based on the results shown on http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median, we can see detecting scops bottom-up may further reduce Polly compile-time by more than 10%. Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Tue Jul 30 10:19:03 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Tue, 30 Jul 2013 10:19:03 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F7CF3C.9060603@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> Message-ID: <51F7F587.30304@gmail.com> On 7/30/13 7:35 AM, Krzysztof Parzyszek wrote: > On 7/29/2013 6:28 PM, Andrew Trick wrote: >> >> You mean that LICM and Unswitching should be left for later? For the >> purpose of exposing scalar optimizations, I'm not sure I agree with >> that but I'd be interested in examples. > > Optimizations like LICM, and unswitching can potentially damage > perfect nesting of loops. For example, consider this nest: > > for (i) { > for (j) { > ... = A[i]; > } > } > > The load of A[i] is invariant in the j-loop (assuming no aliased > stores), and even if it was not invariant, the address computation > code is. A pass of LICM could then generate something like this: > > for (i) { > t = A[i]; > for (j) { > ... = t; > } > } > > This is no longer a perfect nest, and a variety of loop nest > optimizations will no longer be applicable. In general, nest > optimizations have a greater potential for improving code performance > than scalar optimizations, and should be given priority over them. In > most cases (excluding loop interchange, for example), the LICM > opportunity will remain, and can be taken care of later. > Yes, LICM will make perfect nesting become imperfect. When I define pre-ipo pass, I also take this into account as well. I think for a while, and are not able to figure out any strong reason for running or not running LICM in pre-ipo pass, or other compilation phase before LNO. The pro for running LICM early is that it may move big redundant stuff out of loop nest. You never know how big it is. In case you are lucky , you can move lot of stuff out of loop, the loop may become much smaller and hence enable lots of downstream optimizations. This sound to be a big win for control-intensive programs where Loop-nest-opt normally is a big, expensive no-op. The con side is that, as you said, the nest is not perfect any more. However, I would argue LNO optimizations should be able to tackle the cases when imperfect part is simple enough (say, no call, no control etc). (FYI, Open64's LNO is able to tackle imperfect nesting so long as imperfect part is simple). Or you just reverse the LICM, that dosen't sound hard. Similar argument for unswitching, you can run fusion to reverse the unswitching, and you certainly need this opt to transform input nest into appropriate form. While this sound bit lame for those HPC thinking folks, the majority care the performance of control intensive programs. The design have to make the later happy:-) From atrick at apple.com Tue Jul 30 10:41:25 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 30 Jul 2013 10:41:25 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F7CF3C.9060603@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> Message-ID: <628496A2-7BC5-43E7-A702-024483CFBE05@apple.com> On Jul 30, 2013, at 7:35 AM, Krzysztof Parzyszek wrote: > On 7/29/2013 6:28 PM, Andrew Trick wrote: >> >> You mean that LICM and Unswitching should be left for later? For the purpose of exposing scalar optimizations, I'm not sure I agree with that but I'd be interested in examples. > > Optimizations like LICM, and unswitching can potentially damage perfect nesting of loops. For example, consider this nest: > > for (i) { > for (j) { > ... = A[i]; > } > } > > The load of A[i] is invariant in the j-loop (assuming no aliased stores), and even if it was not invariant, the address computation code is. A pass of LICM could then generate something like this: > > for (i) { > t = A[i]; > for (j) { > ... = t; > } > } > > This is no longer a perfect nest, and a variety of loop nest optimizations will no longer be applicable. In general, nest optimizations have a greater potential for improving code performance than scalar optimizations, and should be given priority over them. In most cases (excluding loop interchange, for example), the LICM opportunity will remain, and can be taken care of later. > > > An example of where the target-dependent factors come into the picture before the target-specific stage (target-specific optimizations, lowering, etc.), may be loop distribution. The example below does not belong to the nest optimizations, but the generic target-independent scalar optimizations will still apply after this transformation. > Say that we have a loop that performs several computations, some of which may be amenable to more aggressive optimizations: > > for (i) { > v = 1.0/(1.0 + A[i]*A[i]); > B[i] = (1.0 - B[i])*v > } > > Suppose that we have a hardware that can perform very fast FMA operations (possibly vectorized). We could transform the loop into > > for (i) { > t = 1.0 + A[i]*A[i]; > v = 1.0/t; > B[i] = (1.0 - B[i])*v; > } > > And then expand t into an array, and distribute the loop: > > for (i) { > t[i] = 1.0 + A[i]*A[i]; > } > for (i) { > v = 1.0/t[i]; > B[i] = (1.0 - B[i])*v; > } > > The first loop may then be unrolled and vectorized, while the second one may simply be left alone. > > This may be difficult to address in a target-independent way (via a generic TTI interface), since we are trying to identify a specific pattern that, for the specific hardware, may be worth extracting out of a more complex loop. In addition to that, for this example to make sense, the vectorization passes would have to run afterwards, which would then put a bound on how late this transformation could be done. > > I guess the approach of being able to extend/modify what the PMB creates, that you mention below, would address this problem. > > >> I think you're only worried about the impact on loop nest optimizations. Admittedly I'm not making much concessesion for that, because I think of loop nest optimization as a different tool that will probably want fairly substantial changes to the pass pipeline anyway. > > Yes, I agree. My major concern is that we need to be careful that we don't accidentally make things harder for ourselves. I very much agree with the canonicalization approach, and loop nests can take a bit of preparation (even including loop distribution). At the same time, there are other optimizations that will destroy the canonical structures (of loops, loop nests, etc.), that also need to take place at some point. There should be a place in the optimization sequence, where the "destructive" optimizations have not yet taken place, and where the "constructive" (canonical) passes can be executed. The farther down the optimization stream we push the "destructive" ones, the more flexibility we will have as to what types of transformations can be done that require the code to be in a canonical form. Thanks Krysztof. I appreciate your input and examples. It’s difficult to balance things like Unswitching and LICM that clearly need to run early to expose scalar opts, but in some sense are destructive. I agree with Shuxin’s analysis that these transformations can be reversed if the effort is made. Those developing experimental LNO tools can use any of the mechanisms described below to work around it. >> Here's a few of ways it might work: >> (1) Loop nest optimizer extends the standard PMB by plugging in its own passes prior to Generic Loop Opts in addition to loading TTI. The loop nest optimizer's passes are free to query TTI: >> >> (2) Loop nest optimizer suppresses generic loop opts through a PMB flag (assuming they are too disruptive). It registers its own loop passes with the Target Loop Opts. It registers instances of generic loop opts to now run after loop nest optimization, and registers new instances of scalar opts to rerun after Target Loop Opts if needed. >> >> (3) If the loop nest optimizer were part of llvm core libs, then we could have a completely separate passmanager builder for it. > > All of these approaches would work (even concurrently). I think (3) could potentially be the future goal. > > >> Are you afraid that LICM and unswitching will obfuscate the loops to the point that you can’t recognize the idiom? The current pass pipeline would have the same problem. > > Actually, in this case my concern was the interleaving of the target-dependent code with target-independent code (if we were to do all idiom recognition in the same pass), or code duplication (if the target-independent and target-dependent passes were to be separate). The more I think about it, the more I favor the "separate" approach, since the target-specific idioms may be very different for different targets, and there doesn't seem to be much that can be handled in a common code (hence not a lot of actual duplication would happen). I think we may want to run some generic loop idiom recognition early (MemCpy/MemSet) because it may aid subsequent analysis. Shuxin described this as a kind of “IR promotion” which is a good way to look at it. However, targets do want to plugin custom idioms and I think those can run later, separately. To deal with pass ordering problems introduced by target-specific opts, targets should be able to rerun generic passes if it’s worth the compile time. There are a couple things we should improve to make this more practical: - The generic, canonicalization passes should be able to run in a mode that limits transformations to those that are locally profitable. We don’t want to canonicalize late and defeat earlier optimizations. - We should expose more generic optimizations as utilities that can run on-demand for a given block or loop (GVN, LICM). -Andy From qcolombet at apple.com Tue Jul 30 09:52:37 2013 From: qcolombet at apple.com (Quentin Colombet) Date: Tue, 30 Jul 2013 09:52:37 -0700 Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. In-Reply-To: References: Message-ID: <86A7F65B-6CE7-4A41-A8A5-D20623BE9A41@apple.com> Hi Paul, Thanks for sharing your patches. This is interesting. Cheers, -Quentin On Jul 29, 2013, at 2:40 PM, Redmond, Paul wrote: > Hi, > > Several weeks ago a prototyped a feature similar to what you're describing. I was experimenting to see how one might implement a feature like ICC's -vec–report feature in clang/llvm. My approach was to create an ImmutablePass which stores notes. I modified the loop vectorizer and the unroll pass to add notes when loops were vectorized or unrolled. > > On the clang side I add an OptReport to the pass manager and dump out the notes as diagnostics. It worked ok as a prototype but getting the source locations correct was a bit fragile. > > I've attached some patches in case you're interested. > > Paul > > > > From: Quentin Colombet > > Date: Tuesday, 16 July, 2013 8:21 PM > To: LLVM Developers Mailing List > > Subject: [LLVMdev] [RFC] Add warning capabilities in LLVM. > > Hi, > > I would like to start a discussion about error/warning reporting in LLVM and how we can extend the current mechanism to take advantage of clang capabilities. > > > ** Motivation ** > > Currently LLVM provides a way to report error either directly (print to stderr) or by using a user defined error handler. For instance, in inline asm parsing, we can specify the diagnostic handler to report the errors in clang. > > The basic idea would be to be able to do that for warnings too (and for other kind of errors?). > A motivating example can be found with the following link where we want LLVM to be able to warn on the stack size to help developing kernels: > http://llvm.org/bugs/show_bug.cgi?id=4072 > > By adding this capability, we would be able to have access to all the nice features clang provides with warnings: > - Promote it to an error. > - Ignore it. > > > ** Challenge ** > > To be able to take advantage of clang framework for warning/error reporting, warnings have to be associated with warning groups. > Thus, we need a way for the backend to specify a front-end warning type. > > The challenge is, AFAICT (which is not much, I admit), that front-end warning types are statically handled using tablegen representation. > > > ** Advices Needed ** > > 1. Decide whether or not we want such capabilities (if we do not we may just add sporadically the support for a new warning/group of warning/error). > 2. Come up with a plan to implement that (assuming we want it). > > > Thanks for the feedbacks. > > Cheers, > > -Quentin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bintzeng at gmail.com Tue Jul 30 10:55:38 2013 From: bintzeng at gmail.com (Bin Tzeng) Date: Tue, 30 Jul 2013 10:55:38 -0700 Subject: [LLVMdev] Disable memset synthesization Message-ID: Hi all, LLVM is smart that it can synthesize llvm.memset, llvm.memcpy etc. from loops, which can be lowered into calls to memset, memcpy and so on. Is there an option that can disable this optimization? For some cases, I do not want the code to depend on libc. Thanks in advance! Bin -------------- next part -------------- An HTML attachment was scrubbed... URL: From letz at grame.fr Tue Jul 30 10:56:47 2013 From: letz at grame.fr (=?iso-8859-1?Q?St=E9phane_Letz?=) Date: Tue, 30 Jul 2013 19:56:47 +0200 Subject: [LLVMdev] Strange crash with LLVM 3.3 Message-ID: Hi, We are embedding our DSL language + LLVM in a modified WebKit based Safari on OSX. Starting with LLVM 3.3 (it was working with LLVM 3.1...) we see the following crash: Any idea? Thanks. Stéphane Letz ====================== Process: SafariForWebKitDevelopment [79228] Path: /Applications/Safari.app/Contents/MacOS/SafariForWebKitDevelopment Identifier: SafariForWebKitDevelopment Version: 7536.30.1 Code Type: X86-64 (Native) Parent Process: perl5.12 [79212] User ID: 501 Date/Time: 2013-07-30 19:16:36.081 +0200 OS Version: Mac OS X 10.8.4 (12E55) Report Version: 10 Sleep/Wake UUID: 76D14C5C-4635-4D22-83AD-A997908C17BB Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Application Specific Information: /opt/local/libexec/llvm-3.3/lib/libLLVM-3.3.dylib *** error for object 0x7fff5a075bc0: pointer being freed was not allocated Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff98ba8d46 __kill + 10 1 libsystem_c.dylib 0x00007fff99668df0 abort + 177 2 libsystem_c.dylib 0x00007fff9963c9b9 free + 392 3 libLLVM-3.3.dylib 0x000000010c06fb4c _GLOBAL__I_a + 412 4 dyld 0x00007fff6579b378 ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 236 5 dyld 0x00007fff6579b762 ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 46 6 dyld 0x00007fff6579806e ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 380 7 dyld 0x00007fff65797fc4 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 210 8 dyld 0x00007fff65797fc4 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 210 9 dyld 0x00007fff65797fc4 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 210 10 dyld 0x00007fff65797fc4 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 210 11 dyld 0x00007fff65797fc4 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&) + 210 12 dyld 0x00007fff65797eba ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 54 13 dyld 0x00007fff65789fc0 dyld::initializeMainExecutable() + 207 14 dyld 0x00007fff6578db04 dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 3060 15 dyld 0x00007fff65789397 dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*) + 761 16 dyld 0x00007fff6578905e _dyld_start + 54 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x00007fff5a075b00 rcx: 0x00007fff5a075ae8 rdx: 0x0000000000000000 rdi: 0x000000000001357c rsi: 0x0000000000000006 rbp: 0x00007fff5a075b10 rsp: 0x00007fff5a075ae8 r8: 0x0000000000000000 r9: 0x0000000000000000 r10: 0x00007fff98baa342 r11: 0x0000000000000202 r12: 0x0000000105b91000 r13: 0x0000000105b8e000 r14: 0x00007fff5a075bc0 r15: 0x0000000000000003 rip: 0x00007fff98ba8d46 rfl: 0x0000000000000202 cr2: 0x00007fff7e8f8ff0 Logical CPU: 0 Binary Images: 0x105b88000 - 0x105b88fff +SafariForWebKitDevelopment (7536.30.1) <7E1AB8E9-8D8B-3A43-8E63-7C92529C507F> /Applications/Safari.app/Contents/MacOS/SafariForWebKitDevelopment 0x105b93000 - 0x105f72ff7 com.apple.JavaScriptCore (538+ - 538.1+) <1AF4B0D6-B929-34DB-8B30-C1F87E220A48> /Documents/*/JavaScriptCore.framework/Versions/A/JavaScriptCore 0x10616e000 - 0x1062d8ff7 com.apple.WebKit (538+ - 538.1+) /Documents/*/WebKit.framework/Versions/A/WebKit 0x106476000 - 0x1066e9ff7 com.apple.WebKit2 (538+ - 538.1+) <83A89E5D-B5D7-3EA7-8A19-94108C65C448> /Documents/*/WebKit2.framework/Versions/A/WebKit2 0x106af3000 - 0x107bc4ff7 com.apple.WebCore (538+ - 538.1+) <0CAD83F7-1E5C-369F-9DAF-353E72211959> /Documents/*/WebCore.framework/Versions/A/WebCore 0x108fa2000 - 0x108fbaff7 +libHTTPDFaust.dylib (0) <5AF1DB20-7ACE-3C22-8377-C70FEEC70C9C> /usr/local/lib/faust/libHTTPDFaust.dylib 0x109013000 - 0x1091edff7 +libfaust.dylib (0) <1342D1BF-DF91-3A51-82B4-8046BB854AA8> /usr/local/lib/faust/libfaust.dylib 0x10a1a7000 - 0x10b0eef27 +libLLVM-3.1.dylib (0) /opt/local/libexec/*/libLLVM-3.1.dylib 0x10b81b000 - 0x10b8f0ff7 +libsqlite3.0.dylib (0) /opt/local/lib/libsqlite3.0.dylib 0x10b90a000 - 0x10ba44fff +libxml2.2.dylib (0) <272F9E53-982A-3384-AC78-CAA70AC73803> /opt/local/lib/libxml2.2.dylib 0x10ba7a000 - 0x10ba8bff7 +libz.1.dylib (0) /opt/local/lib/libz.1.dylib 0x10ba91000 - 0x10ba9fff7 +libmicrohttpd.10.dylib (0) <9217AF23-0AFE-39A8-8105-8F12621E60BF> /opt/local/lib/libmicrohttpd.10.dylib 0x10baa8000 - 0x10bb82fef +libgnutls.28.dylib (0) /opt/local/lib/libgnutls.28.dylib 0x10bbb0000 - 0x10bc21fff +libgcrypt.11.dylib (0) /opt/local/lib/libgcrypt.11.dylib 0x10bc39000 - 0x10bc3bff7 +libgpg-error.0.dylib (0) <29ABCAE4-FA5B-390F-911D-DDF823866B30> /opt/local/lib/libgpg-error.0.dylib 0x10bc3e000 - 0x10bc46ff7 +libintl.8.dylib (0) /opt/local/lib/libintl.8.dylib 0x10bc52000 - 0x10bd4aff7 +libiconv.2.dylib (0) <1914316E-52FA-3DC3-9CE7-6D2D71CFD4DB> /opt/local/lib/libiconv.2.dylib 0x10bd59000 - 0x10bd7ffff +libnettle.4.dylib (0) /opt/local/lib/libnettle.4.dylib From resistor at mac.com Tue Jul 30 11:00:58 2013 From: resistor at mac.com (Owen Anderson) Date: Tue, 30 Jul 2013 11:00:58 -0700 Subject: [LLVMdev] Disable memset synthesization In-Reply-To: References: Message-ID: <843A76E8-CF51-4BBC-9787-49F2F33F0CFB@mac.com> If you're using just LLVM, you can simply not run optimizations like LoopIdiomRecognizer that synthesize these operations. Assuming you mean clang+LLVM, you can use -ffreestanding to achieve most of what you want, but note that LLVM matches GCC in that it requires certain functions to have expansions available for linking. If that's not workable for your target, you'll need to modify clang itself to omit those optimizations. From http://gcc.gnu.org/onlinedocs/gcc/Standards.html > GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. --Owen On Jul 30, 2013, at 10:55 AM, Bin Tzeng wrote: > Hi all, > > LLVM is smart that it can synthesize llvm.memset, llvm.memcpy etc. from loops, which can be lowered into calls to memset, memcpy and so on. Is there an option that can disable this optimization? For some cases, I do not want the code to depend on libc. > > Thanks in advance! > Bin > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Tue Jul 30 11:01:46 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Tue, 30 Jul 2013 13:01:46 -0500 Subject: [LLVMdev] Disable memset synthesization In-Reply-To: References: Message-ID: <51F7FF8A.3060001@codeaurora.org> On 7/30/2013 12:55 PM, Bin Tzeng wrote: > Hi all, > > LLVM is smart that it can synthesize llvm.memset, llvm.memcpy etc. from > loops, which can be lowered into calls to memset, memcpy and so on. Is > there an option that can disable this optimization? For some cases, I do > not want the code to depend on libc. You can use -fno-builtin, if that suits your needs. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From rafael.espindola at gmail.com Tue Jul 30 11:07:40 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Tue, 30 Jul 2013 14:07:40 -0400 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: Message-ID: How do you handle this during codegen? One problem is avoid stack changes (like spills). Another is coordinating things that are using allocas and those that are not but end up in the stack. Consider void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor argn, int argp1...) You will need an alloca for argn, but the ABI also requires it to be next to the plain integers that didn' fit in registers, no? This is part of the reason my suggestion was to have a single opaque object representing the frame being constructed and a getelementpointer like abstraction to get pointers out of it. On 25 July 2013 17:38, Reid Kleckner wrote: > Hi LLVM folks, > > To properly implement pass-by-value in the Microsoft C++ ABI, we need to be > able > to take the address of an outgoing call argument slot. This is > http://llvm.org/PR5064 . > > Problem > ------- > > On Windows, C structs are pushed right onto the stack in line with the other > arguments. In LLVM, we use byval to model this, and it works for C structs. > However, C++ records are also passed this way, and reusing byval for C++ > records > breaks C++ object identity rules. > > In order to implement the ABI properly, we need a way to get the address of > the > argument slot *before* we start the call, so that we can either construct > the > object in place on the stack or at least call its copy constructor. > > This is further complicated by the possibility of nested calls passing > arguments by > value. A good general case to think about is a binary tree of calls that > take > two arguments by value and return by value: > > struct A { int a; }; > A foo(A, A); > foo(foo(A(), A()), foo(A(), A())); > > To complete the outer call to foo, we have to adjust the stack for its > outgoing > arguments before the inner calls to foo, and arrange for the sret pointers > to > point to those slots. > > To make this even more complicated, C++ methods are typically callee cleanup > (thiscall), but free functions are caller cleanup (cdecl). > > Features > -------- > > A few weeks ago, I sat down with some folks at Google and we came up with > this > proposal, which tries to add the minimum set of LLVM IL features to make > this > possible. > > 1. Allow alloca instructions to use llvm.stacksave values to indicate > scoping. > > This creates an SSA dependence between the alloca instruction and the > stackrestore instruction that prevents optimizers from accidentally > reordering > them in ways that don't verify. llvm.stacksave in this case is taking on a > role > similar to CALLSEQ_START in the selection dag. > > LLVM can also apply this to dynamic allocas from inline functions to ensure > that > optimizers don't move them. > > 2. Add an 'alloca' attribute for parameters. > > Only an alloca value can be passed to a parameter with this attribute. It > cannot be bitcasted or GEPed. An alloca can only be passed in this way > once. > It can be passed as a normal pointer to any number of other functions. > > Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, > there > can be no allocas between the creation of an alloca passed with this > attribute > and its associated call. > > 3. Add a stackrestore field to call and invoke instructions. > > This models calling conventions which do their own cleanup, and ensures that > even after optimizations have perturbed the IR, we don't consider the > allocas to > be live. For caller cleanup conventions, while the callee may have called > destructors on its arguments, the allocas can be considered live until the > stack > restore. > > Example > ------- > > A single call to foo, assuming it is stdcall, would be lowered something > like: > > %res = alloca %struct.A > %base = llvm.stacksave() > %arg1 = alloca %struct.A, stackbase %base > %arg2 = alloca %struct.A, stackbase %base > call @A_ctor(%arg1) > call @A_ctor(%arg2) > call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore > %base > > If control does not flow through a call or invoke with a stackrestore field, > then manual calls to llvm.stackrestore must be emitted before another call > or > invoke can use an 'alloca' argument. The manual stack restore call ends the > lifetime of the allocas. This is necessary to handle unwind edges from > argument > expression evaluation as well as the case where foo is not callee cleanup. > > Implementation > -------------- > > By starting out with the stack save and restore intrinsics, we can hopefully > approach a slow but working implementation sooner rather than later. The > work > should mostly be in the verifier, the IR, its parser, and the x86 backend. > > I don't plan to start working on this immediately, but over the long run > this will be really important to support well. > > --- > > That's all! Please send feedback! This is admittedly a really complicated > feature and I'm sorry for inflicting it on the LLVM community, but it's > obviously beyond my control. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From bintzeng at gmail.com Tue Jul 30 11:19:38 2013 From: bintzeng at gmail.com (Bin Tzeng) Date: Tue, 30 Jul 2013 11:19:38 -0700 Subject: [LLVMdev] Disable memset synthesization In-Reply-To: <51F7FF8A.3060001@codeaurora.org> References: <51F7FF8A.3060001@codeaurora.org> Message-ID: Thanks! That works. On Tue, Jul 30, 2013 at 11:01 AM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote: > On 7/30/2013 12:55 PM, Bin Tzeng wrote: > >> Hi all, >> >> LLVM is smart that it can synthesize llvm.memset, llvm.memcpy etc. from >> loops, which can be lowered into calls to memset, memcpy and so on. Is >> there an option that can disable this optimization? For some cases, I do >> not want the code to depend on libc. >> > > You can use -fno-builtin, if that suits your needs. > > -K > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul_Robinson at playstation.sony.com Tue Jul 30 11:31:57 2013 From: Paul_Robinson at playstation.sony.com (Robinson, Paul) Date: Tue, 30 Jul 2013 18:31:57 +0000 Subject: [LLVMdev] llvm.org bug trend Message-ID: Over most of the past year, I have been keeping an eye on the overall LLVM.org open-bug count. Sampling the count (almost) every Monday morning, it is a consistently non-decreasing number. I thought I'd post something about it to the Dev lists, as the count broke 4000 this past week. For your entertainment here's a chart that Excel produced from the data. (To make it more dramatic, I carefully did not use a proper zero point on the X-axis.) I do not have per-category breakdowns, sorry, just the raw total. Makes me think more seriously about cruising the bug list for something that looks like I could actually fix it... --paulr [cid:image001.png at 01CE8C56.9A40D7E0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 11909 bytes Desc: image001.png URL: From rnk at google.com Tue Jul 30 11:32:35 2013 From: rnk at google.com (Reid Kleckner) Date: Tue, 30 Jul 2013 11:32:35 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: Message-ID: On Tue, Jul 30, 2013 at 11:07 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > How do you handle this during codegen? One problem is avoid stack > changes (like spills) I'm not sure I understand your question, but my plan is to basically use a frame pointer when there is a call with an argument using the 'alloca' attribute. It'll be slow but functional. Later the backend can be optimized to be clever about spilling through an SP-based memory operand in the presence of stack adjustments. I don't yet have a concrete plan for this, and it will require more familiarity with the backend than I currently have. Another is coordinating things that are using > allocas and those that are not but end up in the stack. Consider > > void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor > argn, int argp1...) > > You will need an alloca for argn, but the ABI also requires it to be > next to the plain integers that didn' fit in registers, no? This is > part of the reason my suggestion was to have a single opaque object > representing the frame being constructed and a getelementpointer like > abstraction to get pointers out of it. This proposal puts this complexity in the backend. The backend will lay out the outgoing argument slots as required by the ABI, and alloca pointer will be resolved to point to the appropriate outgoing argument slot. The verifier will be changed to reject IR with a live alloca between a call site with an alloca-attributed argument and the creation of that alloca. This will work however: %s1 = stacksave %1 = alloca stackbase %s1 %s2 = stacksave %2 = alloca stackbase %s2 call @foo(%2 alloca) stackrestore %s2 call @foo(%1 alloca) stackrestore %s1 Because the %2 alloca is dead due to the stack restore before the second foo call. I should also mention how this interacts with regparms. The win64 CC has 4 regparms, and if one of them is a struct, it is passed indirectly. Users can easily handle that in the frontend, and the backend could reject the alloca attribute on parameters that should be in registers. I need to double-check what happens for fastcall on x86_32. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swlin at post.harvard.edu Tue Jul 30 11:41:55 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Tue, 30 Jul 2013 11:41:55 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <878v0phswm.fsf@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> Message-ID: > Right. What's the point of all the effort devoted to MSVC++ ABI > compatibility when Clang doesn't need it for being a top-notch C++ > compiler on Windows? I brought up a similar point a little bit earlier, too.... It seems like the only necessary condition for being a first-class native-code development tool on Windows is to support the platform C ABI and the subset of the C++ ABI required by COM, and that is the most that any non-MS compiler on Windows tries to do, so I am genuinely curious why there is so much effort being spent on the more esoteric portions of the ABI? As far as I understand it, the case being considered here is not supported in COM, which constrains the types of parameter and return values at ABI boundaries to pointers, simple C types, and some special cases... Stephen From cbergstrom at pathscale.com Tue Jul 30 11:47:49 2013 From: cbergstrom at pathscale.com (=?windows-1252?Q?=22C=2E_Bergstr=F6m=22?=) Date: Wed, 31 Jul 2013 01:47:49 +0700 Subject: [LLVMdev] [cfe-dev] llvm.org bug trend In-Reply-To: References: Message-ID: <51F80A55.6090801@pathscale.com> On 07/31/13 01:31 AM, Robinson, Paul wrote: > > Over most of the past year, I have been keeping an eye on the overall > LLVM.org open-bug count. > > Sampling the count (almost) every Monday morning, it is a consistently > non-decreasing number. > > I thought I’d post something about it to the Dev lists, as the count > broke 4000 this past week. > > For your entertainment here’s a chart that Excel produced from the > data. (To make it more > > dramatic, I carefully did not use a proper zero point on the X-axis.) > > I do not have per-category breakdowns, sorry, just the raw total. > > Makes me think more seriously about cruising the bug list for > something that looks like > > I could actually fix it… > More users and more bug reports - I wouldn't be surprised if the hidden number of "bugs" was over 10k. Is there a way to get bugs fixed per month or per week? Even if that number was decreasing I'd just attribute it to llvm/clang stabilizing and things becoming more complicated to fix. By comparison how many open issues does gcc currently have. I'm not trying to take away from your point, but there's probably not magic bullet. Maybe a bug fix month? (Trying to sort out a bunch of low hanging fruit that new developers may be able to fix?) From alex.m.weisberger at gmail.com Tue Jul 30 12:08:31 2013 From: alex.m.weisberger at gmail.com (Alex Weisberger) Date: Tue, 30 Jul 2013 15:08:31 -0400 Subject: [LLVMdev] [cfe-dev] llvm.org bug trend In-Reply-To: <51F80A55.6090801@pathscale.com> References: <51F80A55.6090801@pathscale.com> Message-ID: A bug fix month is certainly not a bad idea. It gives people who want to contribute a good starting point if they don't know where to begin. I fall under that category so I would benefit from that. A win-win situation as they say. On Jul 30, 2013 2:54 PM, C. Bergström wrote: > On 07/31/13 01:31 AM, Robinson, Paul wrote: > >> >> Over most of the past year, I have been keeping an eye on the overall >> LLVM.org open-bug count. >> >> Sampling the count (almost) every Monday morning, it is a consistently >> non-decreasing number. >> >> I thought I’d post something about it to the Dev lists, as the count >> broke 4000 this past week. >> >> For your entertainment here’s a chart that Excel produced from the data. >> (To make it more >> >> dramatic, I carefully did not use a proper zero point on the X-axis.) >> >> I do not have per-category breakdowns, sorry, just the raw total. >> >> Makes me think more seriously about cruising the bug list for something >> that looks like >> >> I could actually fix it… >> >> More users and more bug reports - I wouldn't be surprised if the hidden > number of "bugs" was over 10k. Is there a way to get bugs fixed per month > or per week? Even if that number was decreasing I'd just attribute it to > llvm/clang stabilizing and things becoming more complicated to fix. By > comparison how many open issues does gcc currently have. I'm not trying to > take away from your point, but there's probably not magic bullet. > > Maybe a bug fix month? (Trying to sort out a bunch of low hanging fruit > that new developers may be able to fix?) > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From westdac at gmail.com Tue Jul 30 12:14:16 2013 From: westdac at gmail.com (Dan) Date: Tue, 30 Jul 2013 13:14:16 -0600 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 Message-ID: I'll try to run through the scenario: 64-bit register type target (all registers have 64 bits). all 32-bits are getting promoted to 64-bit integers Problem: MUL on i32 is getting promoted to MUL on i64 MUL on i64 is getting expanded to a library call in compiler-rt the problem is that MUL32 gets promoted and then converted into a subroutine call because it is now type i64, even though I want the MUL I32 to remain as an operation in the architecture. MUL i32 would generate a 64-bit results from the lower 32-bit portions of 64-bit source operands. In customize for the operations, I am trying to do something like: case ISD::MUL: { EVT OpVT = Op.getValueType(); if (OpVT == MVT::i64) { RTLIB::Libcall LC = RTLIB::MUL_I64; SDValue Dummy; return ExpandLibCall(LC, Op, DAG, false, Dummy, *this); } else if (OpVT == MVT::i32){ ??? What to do here to not have issues with type i32 } } I've gone a few directions on this. Defining the architecture type i32 leads to a lot of changes that I don't think is the most straightforward change. Would think there is a way to promote the MUL i32 types but still be able to "see" that as a MUL i32 somewhere down the lowering process. Are there suggestions on how to promote the type, but then be able to customize the original i64 to a call and the original mul i32 to an operation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From silvas at purdue.edu Tue Jul 30 12:27:18 2013 From: silvas at purdue.edu (Sean Silva) Date: Tue, 30 Jul 2013 12:27:18 -0700 Subject: [LLVMdev] [cfe-dev] llvm.org bug trend In-Reply-To: References: Message-ID: It would be really nice if our bug tracker had something like the "Issues: 30-day summary" that this jira instance has < https://issues.apache.org/jira/browse/LUCENE>. Another interesting statistic that might be interesting to plot alongside the "open bugs" graph is "total lines of code" (or maybe just plot the ratio "bugs/LOC"). -- Sean Silva On Tue, Jul 30, 2013 at 11:31 AM, Robinson, Paul < Paul_Robinson at playstation.sony.com> wrote: > Over most of the past year, I have been keeping an eye on the overall > LLVM.org open-bug count.**** > > Sampling the count (almost) every Monday morning, it is a consistently > non-decreasing number.**** > > I thought I’d post something about it to the Dev lists, as the count broke > 4000 this past week.**** > > For your entertainment here’s a chart that Excel produced from the data. > (To make it more**** > > dramatic, I carefully did not use a proper zero point on the X-axis.)**** > > ** ** > > I do not have per-category breakdowns, sorry, just the raw total.**** > > ** ** > > Makes me think more seriously about cruising the bug list for something > that looks like**** > > I could actually fix it…**** > > --paulr**** > > ** ** > > [image: cid:image001.png at 01CE8C56.9A40D7E0]**** > > ** ** > > ** ** > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 11909 bytes Desc: not available URL: From tom at stellard.net Tue Jul 30 12:55:19 2013 From: tom at stellard.net (Tom Stellard) Date: Tue, 30 Jul 2013 12:55:19 -0700 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: References: Message-ID: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> On Tue, Jul 30, 2013 at 01:14:16PM -0600, Dan wrote: > I'll try to run through the scenario: > > > 64-bit register type target (all registers have 64 bits). > > all 32-bits are getting promoted to 64-bit integers > > Problem: > > MUL on i32 is getting promoted to MUL on i64 > > MUL on i64 is getting expanded to a library call in compiler-rt > > Can you fix this by marking i64 MUL as Legal? > the problem is that MUL32 gets promoted and then converted into a > subroutine call because it is now type i64, even though I want the MUL I32 > to remain as an operation in the architecture. MUL i32 would generate a > 64-bit results from the lower 32-bit portions of 64-bit source operands. > > In customize for the operations, I am trying to do something like: > > case ISD::MUL: > { > EVT OpVT = Op.getValueType(); > if (OpVT == MVT::i64) { > RTLIB::Libcall LC = RTLIB::MUL_I64; > SDValue Dummy; > return ExpandLibCall(LC, Op, DAG, false, Dummy, *this); > } > else if (OpVT == MVT::i32){ > > ??? What to do here to not have issues with type i32 > } > } > > > I've gone a few directions on this. > > Defining the architecture type i32 leads to a lot of changes that I don't > think is the most straightforward change. > When you say 'defining an architecture type' do you mean with addRegisterClass() in your TargetLowering constructor? If so, then this would be my recommendation. Can you elaborate more on what is preventing you from doing this. > Would think there is a way to promote the MUL i32 types but still be able > to "see" that as a MUL i32 somewhere down the lowering process. > The R600 backend does something similar to this. It has 24-bit MUL and MAD instructions and selects these by looking at an i32 integer and trying to infer whether or not it is really a 24-bit value. See the SelectI24 and SelectU24 functions in AMDGPUISelDAGToDAG.cpp. -Tom > Are there suggestions on how to promote the type, but then be able to > customize the original i64 to a call and the original mul i32 to an > operation? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From mclow.lists at gmail.com Tue Jul 30 14:01:37 2013 From: mclow.lists at gmail.com (Marshall Clow) Date: Tue, 30 Jul 2013 14:01:37 -0700 Subject: [LLVMdev] Weird error from Undefined Sanitizer Message-ID: # Everything is done on Mac OS X 10.8.4, with llvm/clang/libc++/libc++abi built from source this morning # totclang is an alias for the built clang. $ export LLVM=/Sources/LLVM $ export LIBCXX=$LLVM/libcxx $ export LIBCXXABI=$LLVM/libcxxabi $ totclang -std=c++11 -stdlib=libc++ -I $LIBCXX/include -fsanitize=undefined ubsan.cpp -L $LIBCXX/lib -L $LIBCXXABI/lib -lc++abi $ DYLD_LIBRARY_PATH=$LIBCXX/lib:$LIBCXXABI/lib ./a.out ubsan.cpp:22:5: runtime error: member call on address 0x7fff4fc2eaf0 which does not point to an object of type 'std::runtime_error' 0x7fff4fc2eaf0: note: object is of type 'std::runtime_error' 00 00 00 00 e0 7a b5 10 01 00 00 00 38 39 c0 7b dc 7f 00 00 ac b3 fd 0f 01 00 00 00 5e 20 bd 6f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'std::runtime_error' : runtime error: member call on address 0x7fff4fc2eaf0 which does not point to an object of type 'std::runtime_error' 0x7fff4fc2eaf0: note: object is of type 'std::runtime_error' 00 00 00 00 e0 7a b5 10 01 00 00 00 38 39 c0 7b dc 7f 00 00 ac b3 fd 0f 01 00 00 00 5e 20 bd 6f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'std::runtime_error' Note: In a non-proportional font, the carets point to the 'e' in "e0 7a" So, what is at address 0x7fff4fc2eaf0? Is it: a) an object of type 'std::runtime_error' b) not an object of type 'std::runtime_error' Or have I completely misinterpreted this? -- Marshall Marshall Clow Idio Software A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki -------------- next part -------------- A non-text attachment was scrubbed... Name: ubsan.cpp Type: application/octet-stream Size: 619 bytes Desc: not available URL: From eli.friedman at gmail.com Tue Jul 30 14:12:50 2013 From: eli.friedman at gmail.com (Eli Friedman) Date: Tue, 30 Jul 2013 14:12:50 -0700 Subject: [LLVMdev] Weird error from Undefined Sanitizer In-Reply-To: References: Message-ID: On Tue, Jul 30, 2013 at 2:01 PM, Marshall Clow wrote: > # Everything is done on Mac OS X 10.8.4, with llvm/clang/libc++/libc++abi built from source this morning > # totclang is an alias for the built clang. > > $ export LLVM=/Sources/LLVM > $ export LIBCXX=$LLVM/libcxx > $ export LIBCXXABI=$LLVM/libcxxabi > > $ totclang -std=c++11 -stdlib=libc++ -I $LIBCXX/include -fsanitize=undefined ubsan.cpp -L $LIBCXX/lib -L $LIBCXXABI/lib -lc++abi > $ DYLD_LIBRARY_PATH=$LIBCXX/lib:$LIBCXXABI/lib ./a.out > ubsan.cpp:22:5: runtime error: member call on address 0x7fff4fc2eaf0 which does not point to an object of type 'std::runtime_error' > 0x7fff4fc2eaf0: note: object is of type 'std::runtime_error' > 00 00 00 00 e0 7a b5 10 01 00 00 00 38 39 c0 7b dc 7f 00 00 ac b3 fd 0f 01 00 00 00 5e 20 bd 6f > ^~~~~~~~~~~~~~~~~~~~~~~ > vptr for 'std::runtime_error' > : runtime error: member call on address 0x7fff4fc2eaf0 which does not point to an object of type 'std::runtime_error' > 0x7fff4fc2eaf0: note: object is of type 'std::runtime_error' > 00 00 00 00 e0 7a b5 10 01 00 00 00 38 39 c0 7b dc 7f 00 00 ac b3 fd 0f 01 00 00 00 5e 20 bd 6f > ^~~~~~~~~~~~~~~~~~~~~~~ > vptr for 'std::runtime_error' > > Note: In a non-proportional font, the carets point to the 'e' in "e0 7a" > > So, what is at address 0x7fff4fc2eaf0? > > Is it: > a) an object of type 'std::runtime_error' > b) not an object of type 'std::runtime_error' Have you considered "c) both"? :) My guess is that there are multiple definitions of std::runtime_error which aren't getting linked together correctly in the process. -Eli From atrick at apple.com Tue Jul 30 14:20:03 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 30 Jul 2013 14:20:03 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: Message-ID: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi wrote: > Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the “indvars” pass only takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very useful for such complex expressions. Yet, it’s making the compile time much longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. > > Thanks in advance for any response. Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this bad, but I know you’re not the first to run into this problem. There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. If you just have a very large tree with many similar looking subexpressions, then I’m not sure what to do except cut it into reasonable subtrees. AFAICT, it’s not just sorting that’s a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. I don’t see any reason not to limit the size of SCEV expressions, but I also don’t have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at google.com Tue Jul 30 15:11:48 2013 From: eliben at google.com (Eli Bendersky) Date: Tue, 30 Jul 2013 15:11:48 -0700 Subject: [LLVMdev] PNaCl Bitcode reference manual Message-ID: Hello, Following an earlier email ( http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-June/063010.html), we've published an initial version of the PNaCl bitcode reference manual online - http://www.chromium.org/nativeclient/pnacl/bitcode-abi. The PNaCl bitcode is a restricted subset of LLVM IR. The reference manual is quite terse, so for the bigger picture I'll repost links to the design document: * PDF: https://docs.google.com/a/chromium.org/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDo0OWYwZjVkYWFjOWNjODE1 * Text: https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/stability-of-the-pnacl-bitcode-abi Any comments would be most welcome. If anything isn't clear, please speak up - we intend to improve the documentation incrementally. We're also working on better formatting, so consider this an early preview :) Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From kledzik at apple.com Tue Jul 30 15:43:16 2013 From: kledzik at apple.com (Nick Kledzik) Date: Tue, 30 Jul 2013 15:43:16 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F6A1DB.5020909@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> Message-ID: <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> On Jul 29, 2013, at 10:09 AM, Shankar Easwaran wrote: > On 7/29/2013 11:24 AM, Nick Kledzik wrote: >> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >> >From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. >> >> For reference, with mach-o we just added a flag to the overall .o file that says all sections are "safe". The compiler always generates safe object files (unless there is inline code with non-local labels) and always sets the flag. Hand written assembly files did not have the flag by default, but savvy assembly programmers can set it. > We could set this flag for ELF too in the ELF header, but it wouldnot not confirm to the ELF ABI. > > To account safe sections, we should just create an additional section in the ELF (gcc creates a lot many sections to handle executable stack and for LTO). This would just be another section to dictate what sections are safe. Or just create a new empty section with a magic name whose existence tells the linker that all sections are safe. > Isnt it better to have this flag set for every section in Darwin too, makes it flexible. I am not sure about the ABI concerns on Darwin though. I don't see the benefit of that level of detail. Either the compiler produced the object file, so all sections are safe. Or it was hand written. If hand written, it is much easier to either say all sections are safe or none are. Listing which are safe and which are not would be a pain. -Nick From Xiaoyi.Guo at amd.com Tue Jul 30 16:10:51 2013 From: Xiaoyi.Guo at amd.com (Guo, Xiaoyi) Date: Tue, 30 Jul 2013 23:10:51 +0000 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> References: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> Message-ID: Hi Andy, Thanks very much for looking into the problem. In this particular test case, it seems most of the time is spent in the sorting, not the grouping. Later, I realized that it seems in this test case most of the expressions to be compared have different length. I tried the following change in compare() when the LHS and RHS's types are the same: =================================================================== --- lib/Analysis/ScalarEvolution.cpp (revision 187379) +++ lib/Analysis/ScalarEvolution.cpp (working copy) @@ -585,6 +585,9 @@ case scAddExpr: case scMulExpr: case scSMaxExpr: case scUMaxExpr: { const SCEVNAryExpr *LC = cast(LHS); const SCEVNAryExpr *RC = cast(RHS); // Lexicographically compare n-ary expressions. unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands(); + if (LNumOps != RNumOps) { + return (int)LNumOps - (int)RNumOps; + } for (unsigned i = 0; i != LNumOps; ++i) { if (i >= RNumOps) return 1; And the compile time is cut down from 45s to 1s. This will give different sorting result than the original algorithm. However, it looks like that shouldn't be a problem according to this comment before the switch statement in compare(); // Aside from the getSCEVType() ordering, the particular ordering // isn't very important except that it's beneficial to be consistent, // so that (a + b) and (b + a) don't end up as different expressions. Does this solution seem ok? If the above solution seems ok, that solves the problem for cases when most of the time the expressions to be compared have different lengths. However, the problem still exists if the expressions to be compared are large, similar, and have the same length. Maybe I'll leave that to later when there's a test case for such situations? Thanks, Xiaoyi From: Andrew Trick [mailto:atrick at apple.com] Sent: Tuesday, July 30, 2013 2:20 PM To: Guo, Xiaoyi Cc: LLVMdev at cs.uiuc.edu; Dan Gohman Subject: Re: [LLVMdev] creating SCEV taking too long On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi > wrote: Hi, We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the "indvars" pass only takes 45 seconds. It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. I realize that the expression grows to be really large towards the end of the loop. I don't know of all the uses of the built SCEV. But I image it won't be very useful for such complex expressions. Yet, it's making the compile time much longer. So I'm wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. Thanks in advance for any response. Nice test case. I tried printing the SCEV... oops. I haven't seen a case this bad, but I know you're not the first to run into this problem. There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. If you just have a very large tree with many similar looking subexpressions, then I'm not sure what to do except cut it into reasonable subtrees. AFAICT, it's not just sorting that's a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. I don't see any reason not to limit the size of SCEV expressions, but I also don't have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Tue Jul 30 16:28:20 2013 From: echristo at gmail.com (Eric Christopher) Date: Tue, 30 Jul 2013 16:28:20 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> Message-ID: On Mon, Jul 29, 2013 at 9:24 AM, Nick Kledzik wrote: > > On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. > > From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. > Drive by comment here: Other than the overhead of the section header I'm not sure what bloat you're talking about here that the linker needs to process? -eric From renato.golin at linaro.org Tue Jul 30 16:39:37 2013 From: renato.golin at linaro.org (Renato Golin) Date: Wed, 31 Jul 2013 00:39:37 +0100 Subject: [LLVMdev] Enabling the SLP-vectorizer by default for -O3 In-Reply-To: References: <4A0723D2-9E95-4CAF-9E83-0B33EFCA3ECE@apple.com> <1A62205A-6544-421E-B965-29C7779E877E@apple.com> <99897FE3-5C07-4702-8DE8-DF76DB551609@apple.com> Message-ID: Nadav, I ran some benchmarks and I'm seeing 3-6% performance loss on a few of them when we use SLP on both O2 and O3 (with O3 having the biggest differences). Unfortunately, my benchmarking is not scientific, so I can't vow for those numbers, nor I'll have time to investigate it closer in the short term, but I wouldn't be surprised if this is result of extra shuffles we were seeing a few months back on the BB-Vect. This means that we could maybe trim that off (later) by fixing two or three bugs and (fingers crossed) making those shuffles go away. I'm trying to set up a task to compare the most important compile options (including all three vectorizers) on all optimization levels, but that's not going to happen any time soon, so don't take my word for it. If I'm the only one with bad numbers, I'm sure we can fix the issues later if you do introduce the SLP into O3 now. Though, I would like to wait a bit more for O2 and Os, because I can't also vow for its correctness, since we don't have buildbots with the SLP on, nor on O2,, and O2 is more or less the "fast, but still stable" state that people prefer to compile production code. Turn it on on O3, let's see how the bots behave, lets get a few data points and have a bit more information on its state. cheers, --renato On 29 July 2013 21:07, Jim Grosbach wrote: > Cool. Thanks! > -Jim > > On Jul 29, 2013, at 1:07 PM, Renato Golin wrote: > > On 29 July 2013 20:39, Jim Grosbach wrote: > >> These results are really excellent. They’re on Intel, I assume, right? >> What do the ARM numbers look like? Before enabling by default, we should >> make sure that the results are comparable there as well. >> > > Hi Jim, > > I'll have a look. > > --renato > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From don.apoch at gmail.com Tue Jul 30 17:18:29 2013 From: don.apoch at gmail.com (Michael Lewis) Date: Tue, 30 Jul 2013 17:18:29 -0700 Subject: [LLVMdev] Interpreting stack maps for purposes of precise GC Message-ID: Hi all, I've been using the llvm.gcroot intrinsic combined with the generated machine stack maps to attempt to do precise GC without the use of a shadow stack. (This is all research work on a novel language so I have no existing system to compare against, for the record.) Most of my test suite is working and tracing stack roots correctly. However, there seem to be some scenarios where the stack maps do not agree with reality. I suspect this has to do with SP manipulations during the execution of some particular piece of code, but it's hard to gather evidence to corroborate this theory. I did notice that the stack map is invariant (at the LLVM level) with respect to which safe point is actually reached within the host function; i.e. if I have Foo() with two safe points and manipulate the SP between point A and point B, the stack map becomes bogus because nothing accounts for the change to the SP. I'm not sure if this scenario is the actual explanation, but I've also noticed that occasionally the stack map will just seem wrong; it will mark certain stack slots as live roots which are outside the bounds of the actual machine stack frame, for instance. This obviously causes the tracing phase of the GC to wander off into random bits of memory and (usually) crash shortly thereafter. Unfortunately it seems like there is painfully little documentation on how the stack maps work or are meant to be used, so I was hoping to dig up some tribal knowledge from the list. My strategy for interpreting the maps currently looks like this: if (stack offset <= 0) pointer to root = start of current stack frame + offset else pointer to root = end of the stack frame "above" current stack frame + offset + size of frame pointer + size of return address pointer The reason for this split is that when the offset is negative it seems to refer to one span of stack space, whereas when it is positive it appears to be based from a different SP entirely. I found this approach by brute force, i.e. generating a large number of test cases and mapping out the stack on paper until the offsets revealed some semblance of a pattern. However, I'm suspicious about my interpretation of the two cases because of the aforementioned mis-flagging of roots, but again there seems to be no documentation whatsoever describing how to actually find a stack address based on a value in the stack map. Any/all advice would be much appreciated! Thanks, - Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From shankare at codeaurora.org Tue Jul 30 17:33:22 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Tue, 30 Jul 2013 19:33:22 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> Message-ID: <51F85B52.5000604@codeaurora.org> On 7/30/2013 5:43 PM, Nick Kledzik wrote: > On Jul 29, 2013, at 10:09 AM, Shankar Easwaran wrote: > >> On 7/29/2013 11:24 AM, Nick Kledzik wrote: >>> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >>> >From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. >>> >>> For reference, with mach-o we just added a flag to the overall .o file that says all sections are "safe". The compiler always generates safe object files (unless there is inline code with non-local labels) and always sets the flag. Hand written assembly files did not have the flag by default, but savvy assembly programmers can set it. >> We could set this flag for ELF too in the ELF header, but it wouldnot not confirm to the ELF ABI. >> >> To account safe sections, we should just create an additional section in the ELF (gcc creates a lot many sections to handle executable stack and for LTO). This would just be another section to dictate what sections are safe. > Or just create a new empty section with a magic name whose existence tells the linker that all sections are safe. > >> Isnt it better to have this flag set for every section in Darwin too, makes it flexible. I am not sure about the ABI concerns on Darwin though. > I don't see the benefit of that level of detail. Either the compiler produced the object file, so all sections are safe. Or it was hand written. If hand written, it is much easier to either say all sections are safe or none are. Listing which are safe and which are not would be a pain. I can think of two usecases when the compiler needs to emit safe sections on a section by section basis. * code having inline assembly (it only affects the text section) * the compiler trying to do some optimizations that deals with data placed outside function boundaries. Does this make sense ? Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From kledzik at apple.com Tue Jul 30 17:36:40 2013 From: kledzik at apple.com (Nick Kledzik) Date: Tue, 30 Jul 2013 17:36:40 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> Message-ID: <868799E2-FECF-4CBA-B48A-33080712EACD@apple.com> On Jul 30, 2013, at 4:28 PM, Eric Christopher wrote: > On Mon, Jul 29, 2013 at 9:24 AM, Nick Kledzik wrote: >> >> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >> >> From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. >> > > Drive by comment here: > > Other than the overhead of the section header I'm not sure what bloat > you're talking about here that the linker needs to process? The internal model of lld is "atom" based. Each atom is an indivisible run of bytes. A compiler generated function naturally matches that and should be an atom. The problem is that a hand written assembly code could look like it has a couple of functions, but there could be implicit dependencies (like falling through to next function). If an object file has a hundred functions, that means there will be a hundred more sections (one per function). So, if we used -ffunction-sections to determine that an object file was compiler generated, we still have the problem that an assembly language programmer could have hand written extra sections that look like -ffunction-sections would have produced, but he did something tricky like have one function with two entry symbols. So, the linker would need to double check all those hundred sections. -Nick From chandlerc at google.com Tue Jul 30 17:41:37 2013 From: chandlerc at google.com (Chandler Carruth) Date: Tue, 30 Jul 2013 17:41:37 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F85B52.5000604@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> Message-ID: I've not been following this thread at all. However, skimming the original post, I fail to find a nice summary of what problem is trying to be solved. By reading the rest of the thread I divine that the goal is faster links and better dead code stripping? Making that clearer would help. Naming your sections something other than "safe" (which has *very* different connotations) would help more. However, I question several fundamental assumptions: 1) We should be very certain that -ffunction-sections is not a viable solution as it exists and is well supported in other toolchains and environments. 2) We should talk to other ELF producers and coordinate to make sure we don't end up creating a twisty maze of extensions here. 3) We should step back and consider leapfrogging to a fully specialized format to reap even more performance benefits rather than messily patching ELF. #3 may prove irrelevant if this is the only major hurdle for speeding up ELF links. My impression was otherwise. #2 hasn't been done by the other ELF producers, but we should strive to do better. #1 can be solved via science. On Tue, Jul 30, 2013 at 5:33 PM, Shankar Easwaran wrote: > On 7/30/2013 5:43 PM, Nick Kledzik wrote: > >> On Jul 29, 2013, at 10:09 AM, Shankar Easwaran wrote: >> >> On 7/29/2013 11:24 AM, Nick Kledzik wrote: >>> >>>> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>> >>>>> Is there any reason -ffunction-sections and -fdata-sections wouldn't >>>>> work? If it'll work, it may be be better to say "if you want to get a >>>>> better linker output use these options", rather than defining new ELF >>>>> section. >>>>> >>>> >From my understanding, -ffunction-sections is a good semantic match. >>>> But it introduces a lot of bloat in the .o file which the linker must >>>> process. >>>> >>>> For reference, with mach-o we just added a flag to the overall .o file >>>> that says all sections are "safe". The compiler always generates safe >>>> object files (unless there is inline code with non-local labels) and always >>>> sets the flag. Hand written assembly files did not have the flag by >>>> default, but savvy assembly programmers can set it. >>>> >>> We could set this flag for ELF too in the ELF header, but it wouldnot >>> not confirm to the ELF ABI. >>> >>> To account safe sections, we should just create an additional section in >>> the ELF (gcc creates a lot many sections to handle executable stack and for >>> LTO). This would just be another section to dictate what sections are safe. >>> >> Or just create a new empty section with a magic name whose existence >> tells the linker that all sections are safe. >> >> Isnt it better to have this flag set for every section in Darwin too, >>> makes it flexible. I am not sure about the ABI concerns on Darwin though. >>> >> I don't see the benefit of that level of detail. Either the compiler >> produced the object file, so all sections are safe. Or it was hand written. >> If hand written, it is much easier to either say all sections are safe or >> none are. Listing which are safe and which are not would be a pain. >> > I can think of two usecases when the compiler needs to emit safe sections > on a section by section basis. > > * code having inline assembly (it only affects the text section) > * the compiler trying to do some optimizations that deals with data placed > outside function boundaries. > > Does this make sense ? > > > Thanks > > Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnk at google.com Tue Jul 30 17:43:43 2013 From: rnk at google.com (Reid Kleckner) Date: Tue, 30 Jul 2013 17:43:43 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F85B52.5000604@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> Message-ID: Can you guys invent a more specific word than "safe" to describe this concept? The best thing I can come up with is something like "atomizable section". This is basically describing code and data that the linker should feel free to strip and reorder, right? I'm not tracking this closely enough to be totally sure if that's a reasonable word for this. On Tue, Jul 30, 2013 at 5:33 PM, Shankar Easwaran wrote: > On 7/30/2013 5:43 PM, Nick Kledzik wrote: > >> On Jul 29, 2013, at 10:09 AM, Shankar Easwaran wrote: >> >> On 7/29/2013 11:24 AM, Nick Kledzik wrote: >>> >>>> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>> >>>>> Is there any reason -ffunction-sections and -fdata-sections wouldn't >>>>> work? If it'll work, it may be be better to say "if you want to get a >>>>> better linker output use these options", rather than defining new ELF >>>>> section. >>>>> >>>> >From my understanding, -ffunction-sections is a good semantic match. >>>> But it introduces a lot of bloat in the .o file which the linker must >>>> process. >>>> >>>> For reference, with mach-o we just added a flag to the overall .o file >>>> that says all sections are "safe". The compiler always generates safe >>>> object files (unless there is inline code with non-local labels) and always >>>> sets the flag. Hand written assembly files did not have the flag by >>>> default, but savvy assembly programmers can set it. >>>> >>> We could set this flag for ELF too in the ELF header, but it wouldnot >>> not confirm to the ELF ABI. >>> >>> To account safe sections, we should just create an additional section in >>> the ELF (gcc creates a lot many sections to handle executable stack and for >>> LTO). This would just be another section to dictate what sections are safe. >>> >> Or just create a new empty section with a magic name whose existence >> tells the linker that all sections are safe. >> >> Isnt it better to have this flag set for every section in Darwin too, >>> makes it flexible. I am not sure about the ABI concerns on Darwin though. >>> >> I don't see the benefit of that level of detail. Either the compiler >> produced the object file, so all sections are safe. Or it was hand written. >> If hand written, it is much easier to either say all sections are safe or >> none are. Listing which are safe and which are not would be a pain. >> > I can think of two usecases when the compiler needs to emit safe sections > on a section by section basis. > > * code having inline assembly (it only affects the text section) > * the compiler trying to do some optimizations that deals with data placed > outside function boundaries. > > Does this make sense ? > > > Thanks > > Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From echristo at gmail.com Tue Jul 30 17:50:47 2013 From: echristo at gmail.com (Eric Christopher) Date: Tue, 30 Jul 2013 17:50:47 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <868799E2-FECF-4CBA-B48A-33080712EACD@apple.com> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <868799E2-FECF-4CBA-B48A-33080712EACD@apple.com> Message-ID: On Tue, Jul 30, 2013 at 5:36 PM, Nick Kledzik wrote: > > On Jul 30, 2013, at 4:28 PM, Eric Christopher wrote: >> On Mon, Jul 29, 2013 at 9:24 AM, Nick Kledzik wrote: >>> >>> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >>> >>> From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. >>> >> >> Drive by comment here: >> >> Other than the overhead of the section header I'm not sure what bloat >> you're talking about here that the linker needs to process? > > The internal model of lld is "atom" based. Each atom is an indivisible run of bytes. A compiler generated function naturally matches that and should be an atom. The problem is that a hand written assembly code could look like it has a couple of functions, but there could be implicit dependencies (like falling through to next function). > I'll stipulate all of this :) > If an object file has a hundred functions, that means there will be a hundred more sections (one per function). So, if we used -ffunction-sections to determine that an object file was compiler generated, we still have the problem that an assembly language programmer could have hand written extra sections that look like -ffunction-sections would have produced, but he did something tricky like have one function with two entry symbols. So, the linker would need to double check all those hundred sections. > I'm not talking about using -ffunction-sections to determine if something is compiler generated, just that there's no inherent penalty in using -ffunction-sections in general. Basically there's no benefit (unless you allow a flag per object, etc) that says whether or not something is "compiler generated", you may as well just use a flag to the linker or a section in the output (the latter is a fairly common elf-ism). -eric From westdac at gmail.com Tue Jul 30 17:56:44 2013 From: westdac at gmail.com (Dan) Date: Tue, 30 Jul 2013 18:56:44 -0600 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> References: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> Message-ID: Thanks for the information, allow maybe I can re-phrase the question or issue. Assume 64-bit register types, but integer is 32-bit. Already have table generation of the 64-bit operation descriptions. How about this modified approach? Before type-legalization, I'd really like to move all MUL I64 to a subroutine call of my own choice. This would be a form of customization, but I want this to happen before type legalization. Right now, type legalization, promotes all MUL I32 to 64-bit, and I lose the ability to differentiate between what originally was a MUL on 64-bit and 32-bit values. Only thing that I have seen happen at DAG Selection is for lowering custom intrinsic functions like memcpy: ./Target/X86/X86SelectionDAGInfo.cpp:178:X86SelectionDAGInfo::EmitTargetCodeForMemcpy(SelectionDAG &DAG, Is there a general SelectionDAG conversion that can be made to happen before all type promotions? Again, even modifications in ISelDAGToDAG.cpp will be after type promotion in my understanding. On Tue, Jul 30, 2013 at 1:55 PM, Tom Stellard wrote: > On Tue, Jul 30, 2013 at 01:14:16PM -0600, Dan wrote: > > I'll try to run through the scenario: > > > > > > 64-bit register type target (all registers have 64 bits). > > > > all 32-bits are getting promoted to 64-bit integers > > > > Problem: > > > > MUL on i32 is getting promoted to MUL on i64 > > > > MUL on i64 is getting expanded to a library call in compiler-rt > > > > > > Can you fix this by marking i64 MUL as Legal? > > > the problem is that MUL32 gets promoted and then converted into a > > subroutine call because it is now type i64, even though I want the MUL > I32 > > to remain as an operation in the architecture. MUL i32 would generate a > > 64-bit results from the lower 32-bit portions of 64-bit source operands. > > > > In customize for the operations, I am trying to do something like: > > > > case ISD::MUL: > > { > > EVT OpVT = Op.getValueType(); > > if (OpVT == MVT::i64) { > > RTLIB::Libcall LC = RTLIB::MUL_I64; > > SDValue Dummy; > > return ExpandLibCall(LC, Op, DAG, false, Dummy, *this); > > } > > else if (OpVT == MVT::i32){ > > > > ??? What to do here to not have issues with type i32 > > } > > } > > > > > > I've gone a few directions on this. > > > > Defining the architecture type i32 leads to a lot of changes that I don't > > think is the most straightforward change. > > > > When you say 'defining an architecture type' do you mean with > addRegisterClass() in your TargetLowering constructor? If so, then this > would be my recommendation. Can you elaborate more on what is > preventing you from doing this. > > > Would think there is a way to promote the MUL i32 types but still be able > > to "see" that as a MUL i32 somewhere down the lowering process. > > > > The R600 backend does something similar to this. It has 24-bit MUL and > MAD instructions and selects these by looking at an i32 integer and > trying to infer whether or not it is really a 24-bit value. > See the SelectI24 and SelectU24 functions in AMDGPUISelDAGToDAG.cpp. > > -Tom > > > > Are there suggestions on how to promote the type, but then be able to > > customize the original i64 to a call and the original mul i32 to an > > operation? > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dberlin at dberlin.org Tue Jul 30 18:41:10 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Tue, 30 Jul 2013 18:41:10 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: Message-ID: On Mon, Jul 29, 2013 at 8:48 PM, Guo, Xiaoyi wrote: > Thank you very much for your reply. > > > > Do you mean calculate the hash based on element SCEV pointers? No, based on the properties of them (IE type, etc). It will be entirely deterministic You have two cases: Either all these SCEV's are really the same, in which case, this will do nothing Or they are subtly different, but right now it's comparing 128 operands to find out. The hash helps with the second case, but not the first. From dberlin at dberlin.org Tue Jul 30 18:46:49 2013 From: dberlin at dberlin.org (Daniel Berlin) Date: Tue, 30 Jul 2013 18:46:49 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> References: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> Message-ID: On Tue, Jul 30, 2013 at 2:20 PM, Andrew Trick wrote: > > On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi wrote: > > Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the > attached IR.) Creating SCEVs for these expressions takes a long time, making > the compile time too long. E.g., running opt with the “indvars” pass only > takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity > of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of > the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very > useful for such complex expressions. Yet, it’s making the compile time much > longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when > the expression gets really complex and large. Or is there any way to further > optimize the performance of SCEV building for such cases. > > Thanks in advance for any response. > > > Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this > bad, but I know you’re not the first to run into this problem. > > There are two steps in GroupByComplexity, sorting (std::sort) and grouping > (N^2). > > The sort calls SCEVComplexityCompare::compare() which can make multiple > recursive calls for nodes with multiple operands. This looks like it could > be a disaster for expressions that are not strictly trees--exponential in > the size of the DAG. > Yes, this is why i suggested computing some deterministic "complexity hash" on the way, and caching it in the SCEV. It would not help if they were all the same, but if they were different only at the end, you wouldn't end up comparing every operand to get there. If they are all the same, you are right that cut-off is the only reasonable answer. Or calculate "complexity" in a way that does not require operand by operand comparison (IE compute a "complexity" number instead of a hash, as you build the SCEV). It's just trying to get a canonical sort here. This would at least make the sort fast, grouping can't be made linear unless you are willing to trust the hash :) From me at manueljacob.de Tue Jul 30 19:02:38 2013 From: me at manueljacob.de (Manuel Jacob) Date: Wed, 31 Jul 2013 04:02:38 +0200 Subject: [LLVMdev] New ideas about how to improve garbage collection support Message-ID: <73a793656e3470e09895d38154a63810@indus.uberspace.de> Hi, I currently write a LLVM backend for RPython, a programming language and toolchain to write dynamic language interpreters. For example, this toolchain powers PyPy, a fast Python interpreter. In the moment the implementation uses a shadowstack to manage garbage collection stack roots. To improve performance, I experimented with LLVM's garbage collection support. It works by marking a stack slot as containing a garbage collection root. This enables easy code generation for frontends which use the mem2reg pass to construct the SSA form. Such frontends just need to emit a @llvm.gcroot() intrinsic referring to the AllocaInst. RPython uses a SSA representation of variables internally. To use LLVM's GC support, it has to reassign SSA values back to stack slots, which is hard. Another issue (not only for RPython) is that using the @llvm.gcroot() intrinsic prevents mem2reg from promoting these stack slots to registers, disabling optimizations. Ending the live time of a stack root is only possible by storing null to the stack slot. There were some discussions on this list about how to to mark SSA values that are stack roots. None one them resulted in implementation of a better GC root support. This was partly because most ideas assumed that a stack root must be a pointer. With the current scheme, a stack root doesn't need to be a pointer. The stack slot referred by a @llvm.gcroot() intrinsic can contain anything. Not only pointers, but also tagged unions and compressed pointers. Other ideas involved changes to the type system. To support GC roots of any type without changing the type system, a new instruction needs to be introduced. It could be used like this: %gcvar = call %gctype* @allocate_type() mark_gcroot %gctype* %gcvar For a proof-of-concept, which only supports pointers as GC roots, an intrinsic is enough: %gcvar = call %gctype* @allocate_type() %tmp = bitcast %gctype* %gcvar to i8* call void @gcroot.mark(i8* %tmp) What part of the garbage collector needs support from the code generator? When a collection cycle is triggered, the garbage collector needs to know what's contained in variables pointing to garbage collected memory. A moving garbage collector might need to change these variables to point to the new memory location. The code generator emits tables that describe the layout of the stack. Using these tables and target-specific code, the garbage collection runtime is able to "walk the stack", reading and possibly writing variables pointing to garbage collected memory. What do these tables contain? To simplify things, I assume a simple, non-concurrent garbage collector. For each call site that could trigger a garbage collection cycle, the tables contain the size of the frame in which the call happens and the relative locations of the GC roots. Where are the stack roots stored? The naive approach is to push all gc roots on the stack before a call and pop them after the call. A more clever approach is to integrate with the register allocator. Registers which are spilled anyway doesn't need to be pushed on the stack by the garbage collection support code. Also, the values don't need to be reloaded from stack until needed to reduce register pressure. I'd call this "forced spilling". Although my C++ coding skills are quite limited, I'd like to try implementing this. My open questions are: * How to pass "this is a GC root" through SelectionDAG to the post-ISel phase? * How to force spilling of a register that lives across a call? * How to encode in IR which calls can trigger a garbage collection cycle? -Manuel From popizdeh at gmail.com Tue Jul 30 19:06:40 2013 From: popizdeh at gmail.com (Nikola Smiljanic) Date: Wed, 31 Jul 2013 12:06:40 +1000 Subject: [LLVMdev] [cfe-dev] llvm.org bug trend In-Reply-To: References: Message-ID: I'd say that bugzilla is in a very sad state. It doesn't seem to be a part of the development flow (the way commit reviews or tests are). I did some cleanup a month or two ago, mainly trying to close non issues. Went through clang Driver and some C++ bugs and found a bunch of bugs that can't be reproduced. Stuff from gcc stress tests that used to crash but no longer does. Some bugs are only partially true, like http://llvm.org/bugs/show_bug.cgi?id=14084, -print-stats is no longer exposed through the driver but -time is and it doesn't work. It's unclear if some bugs should be fixed, like http://llvm.org/bugs/show_bug.cgi?id=6488 (It's unclear if clang community is interested in supporting this). Some bugs have attached patches that fix the issue in question, like this one http://llvm.org/bugs/show_bug.cgi?id=13992 (the comments also tell a story). And so on... I would like to see bugzilla cleaned up and bug entry process somewhat formalized so that entries meet certain quality. On Wed, Jul 31, 2013 at 5:27 AM, Sean Silva wrote: > It would be really nice if our bug tracker had something like the "Issues: > 30-day summary" that this jira instance has < > https://issues.apache.org/jira/browse/LUCENE>. > > Another interesting statistic that might be interesting to plot alongside > the "open bugs" graph is "total lines of code" (or maybe just plot the > ratio "bugs/LOC"). > > -- Sean Silva > > > On Tue, Jul 30, 2013 at 11:31 AM, Robinson, Paul < > Paul_Robinson at playstation.sony.com> wrote: > >> Over most of the past year, I have been keeping an eye on the overall >> LLVM.org open-bug count.**** >> >> Sampling the count (almost) every Monday morning, it is a consistently >> non-decreasing number.**** >> >> I thought I’d post something about it to the Dev lists, as the count >> broke 4000 this past week.**** >> >> For your entertainment here’s a chart that Excel produced from the data. >> (To make it more**** >> >> dramatic, I carefully did not use a proper zero point on the X-axis.)**** >> >> ** ** >> >> I do not have per-category breakdowns, sorry, just the raw total.**** >> >> ** ** >> >> Makes me think more seriously about cruising the bug list for something >> that looks like**** >> >> I could actually fix it…**** >> >> --paulr**** >> >> ** ** >> >> [image: cid:image001.png at 01CE8C56.9A40D7E0]**** >> >> ** ** >> >> ** ** >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >> >> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 11909 bytes Desc: not available URL: From atrick at apple.com Tue Jul 30 19:46:40 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 30 Jul 2013 19:46:40 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> Message-ID: <76C73CD0-D9E8-4653-A196-B7E70081B5E5@apple.com> On Jul 30, 2013, at 4:10 PM, Guo, Xiaoyi wrote: > Hi Andy, > > Thanks very much for looking into the problem. > > In this particular test case, it seems most of the time is spent in the sorting, not the grouping. > > Later, I realized that it seems in this test case most of the expressions to be compared have different length. I tried the following change in compare() when the LHS and RHS’s types are the same: > > =================================================================== > --- lib/Analysis/ScalarEvolution.cpp (revision 187379) > +++ lib/Analysis/ScalarEvolution.cpp (working copy) > @@ -585,6 +585,9 @@ > case scAddExpr: > case scMulExpr: > case scSMaxExpr: > case scUMaxExpr: { > const SCEVNAryExpr *LC = cast(LHS); > const SCEVNAryExpr *RC = cast(RHS); > > // Lexicographically compare n-ary expressions. > unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands(); > > + if (LNumOps != RNumOps) { > + return (int)LNumOps - (int)RNumOps; > + } > for (unsigned i = 0; i != LNumOps; ++i) { > if (i >= RNumOps) > return 1; > > And the compile time is cut down from 45s to 1s. Committed r187475. Thanks! -Andy > > This will give different sorting result than the original algorithm. However, it looks like that shouldn’t be a problem according to this comment before the switch statement in compare(); > // Aside from the getSCEVType() ordering, the particular ordering > // isn't very important except that it's beneficial to be consistent, > // so that (a + b) and (b + a) don't end up as different expressions. > > Does this solution seem ok? > > If the above solution seems ok, that solves the problem for cases when most of the time the expressions to be compared have different lengths. However, the problem still exists if the expressions to be compared are large, similar, and have the same length. Maybe I’ll leave that to later when there’s a test case for such situations? > > Thanks, > Xiaoyi > > From: Andrew Trick [mailto:atrick at apple.com] > Sent: Tuesday, July 30, 2013 2:20 PM > To: Guo, Xiaoyi > Cc: LLVMdev at cs.uiuc.edu; Dan Gohman > Subject: Re: [LLVMdev] creating SCEV taking too long > > > On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi wrote: > > > Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the “indvars” pass only takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very useful for such complex expressions. Yet, it’s making the compile time much longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. > > Thanks in advance for any response. > > Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this bad, but I know you’re not the first to run into this problem. > > There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). > > The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. > > If you just have a very large tree with many similar looking subexpressions, then I’m not sure what to do except cut it into reasonable subtrees. > > AFAICT, it’s not just sorting that’s a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. > I don’t see any reason not to limit the size of SCEV expressions, but I also don’t have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). > > -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From atrick at apple.com Tue Jul 30 19:52:39 2013 From: atrick at apple.com (Andrew Trick) Date: Tue, 30 Jul 2013 19:52:39 -0700 Subject: [LLVMdev] creating SCEV taking too long In-Reply-To: References: <49738061-70AA-4576-8093-ED674F975EA4@apple.com> Message-ID: On Jul 30, 2013, at 6:46 PM, Daniel Berlin wrote: > On Tue, Jul 30, 2013 at 2:20 PM, Andrew Trick wrote: >> >> On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi wrote: >> >> Hi, >> >> We have a benchmark where there are 128 MAD computations in a loop. (See the >> attached IR.) Creating SCEVs for these expressions takes a long time, making >> the compile time too long. E.g., running opt with the “indvars” pass only >> takes 45 seconds. >> >> It seems that the majority of the time is spent in comparing the complexity >> of the expression operands to sort them. >> >> I realize that the expression grows to be really large towards the end of >> the loop. >> >> I don’t know of all the uses of the built SCEV. But I image it won’t be very >> useful for such complex expressions. Yet, it’s making the compile time much >> longer. >> >> So I’m wondering if it would make sense to abort the creation of SCEV when >> the expression gets really complex and large. Or is there any way to further >> optimize the performance of SCEV building for such cases. >> >> Thanks in advance for any response. >> >> >> Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this >> bad, but I know you’re not the first to run into this problem. >> >> There are two steps in GroupByComplexity, sorting (std::sort) and grouping >> (N^2). >> >> The sort calls SCEVComplexityCompare::compare() which can make multiple >> recursive calls for nodes with multiple operands. This looks like it could >> be a disaster for expressions that are not strictly trees--exponential in >> the size of the DAG. >> > Yes, this is why i suggested computing some deterministic "complexity > hash" on the way, and caching it in the SCEV. > It would not help if they were all the same, but if they were > different only at the end, you wouldn't end up comparing every operand > to get there. > If they are all the same, you are right that cut-off is the only > reasonable answer. > Or calculate "complexity" in a way that does not require operand by > operand comparison (IE compute a "complexity" number instead of a > hash, as you build the SCEV). It's just trying to get a canonical > sort here. > This would at least make the sort fast, grouping can't be made linear > unless you are willing to trust the hash :) That would be ideal. I wish it were obvious to me how to implement it. Xiaoyi’s simple fix handled this scenario and was totally consistent with the current implementation. It seems we’re not yet running into the issue of visiting all paths in a DAG. I don’t want to discourage a more general fix/redesign though. We could probably recover more compile time in many cases. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From shankare at codeaurora.org Tue Jul 30 20:17:50 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Tue, 30 Jul 2013 22:17:50 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> Message-ID: <51F881DE.60809@codeaurora.org> On 7/30/2013 7:41 PM, Chandler Carruth wrote: > I've not been following this thread at all. However, skimming the original > post, I fail to find a nice summary of what problem is trying to be solved. The proposal is trying to garbage collect symbols that are not referenced during the link step. In addition this proposal can be extended to keep frequently called functions/data closer to increase cache coherency. 'lld' tries to atomize all the symbols in a section, but the problem it faces on ELF is that, lld needs to tie all the atoms in a section together to respect the ELF section property. This may be relaxed if lld knows at the time of parsing the section contents, that, if you really need to tie all the atoms together in the final ELF image. This proposal is just a way to convey information from the object file to the linker that sections in the object file can be safely converted to atoms, and they need not be tied together in whichever section they reside. > > By reading the rest of the thread I divine that the goal is faster links > and better dead code stripping? > > Making that clearer would help. Naming your sections something other than > "safe" (which has *very* different connotations) would help more. "safe" is a property of a section, by which a section can be atomized and the atoms can appear anywhere in the final output file. > > However, I question several fundamental assumptions: > 1) We should be very certain that -ffunction-sections is not a viable > solution as it exists and is well supported in other toolchains and > environments. -ffunction-sections and -fdata-sections would work, but that would require all third party libraries etc to make sure that they are all compiled with -ffunction-sections. > 2) We should talk to other ELF producers and coordinate to make sure we > don't end up creating a twisty maze of extensions here. This is not a problem with the general ELF community since the binutils ld/gold would not atomize sections. > 3) We should step back and consider leapfrogging to a fully specialized > format to reap even more performance benefits rather than messily patching > ELF. I dont think we should come up with another object file format. Shankar Easwaran From shankare at codeaurora.org Tue Jul 30 20:21:13 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Tue, 30 Jul 2013 22:21:13 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> Message-ID: <51F882A9.9090400@codeaurora.org> On 7/30/2013 7:43 PM, Reid Kleckner wrote: > Can you guys invent a more specific word than "safe" to describe this > concept? The best thing I can come up with is something like "atomizable > section". This is basically describing code and data that the linker > should feel free to strip and reorder, right? I'm not tracking this > closely enough to be totally sure if that's a reasonable word for this. 'safe' becomes a property of a section, when it can be converted to atoms safely and the atoms can be positioned anywhere/garbage collected in the final output file. Its also could be a synonym to an 'atomizable' section. Thanks Shankar Easwawran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From shankare at codeaurora.org Tue Jul 30 20:24:24 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Tue, 30 Jul 2013 22:24:24 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <868799E2-FECF-4CBA-B48A-33080712EACD@apple.com> Message-ID: <51F88368.8060707@codeaurora.org> On 7/30/2013 7:50 PM, Eric Christopher wrote: > On Tue, Jul 30, 2013 at 5:36 PM, Nick Kledzik wrote: >> On Jul 30, 2013, at 4:28 PM, Eric Christopher wrote: >>> On Mon, Jul 29, 2013 at 9:24 AM, Nick Kledzik wrote: >>>> On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>>> Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >>>> From my understanding, -ffunction-sections is a good semantic match. But it introduces a lot of bloat in the .o file which the linker must process. >>>> >>> Drive by comment here: >>> >>> Other than the overhead of the section header I'm not sure what bloat >>> you're talking about here that the linker needs to process? >> The internal model of lld is "atom" based. Each atom is an indivisible run of bytes. A compiler generated function naturally matches that and should be an atom. The problem is that a hand written assembly code could look like it has a couple of functions, but there could be implicit dependencies (like falling through to next function). >> > I'll stipulate all of this :) > >> If an object file has a hundred functions, that means there will be a hundred more sections (one per function). So, if we used -ffunction-sections to determine that an object file was compiler generated, we still have the problem that an assembly language programmer could have hand written extra sections that look like -ffunction-sections would have produced, but he did something tricky like have one function with two entry symbols. So, the linker would need to double check all those hundred sections. >> > I'm not talking about using -ffunction-sections to determine if > something is compiler generated, just that there's no inherent penalty > in using -ffunction-sections in general. Basically there's no benefit > (unless you allow a flag per object, etc) that says whether or not > something is "compiler generated", you may as well just use a flag to > the linker or a section in the output (the latter is a fairly common > elf-ism). When you consider the complete link line (consisting of multiple archives and object files), you may not have all of them compiled with -ffunction-sections and -fdata-sections. Its also a problem that third party vendors would provide a library which may / may not be compiled with that flag. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From tanmx_star at yeah.net Tue Jul 30 21:13:58 2013 From: tanmx_star at yeah.net (Star Tan) Date: Wed, 31 Jul 2013 12:13:58 +0800 (CST) Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <51F697EF.10405@grosser.es> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> <51F5AC61.2080108@grosser.es> <20130729101810.GO11371MdfPADPa@purples> <51F67E1A.6060400@grosser.es> <20130729161501.GS11371MdfPADPa@purples> <51F697EF.10405@grosser.es> Message-ID: <7df533f1.1fe2.14032eeb095.Coremail.tanmx_star@yeah.net> Hi Tobias and Sven, Thanks for your discussion and suggestion. @Sven: ISL actually allows users to have different identifiers with the same name. The problem that we have discussed is caused by incorrect usage of isl_space in Polly, so please do not worry about ISL library. You can skip the following information related to Polly implementation. @Tobias and Polly developers: I have attached a patch file to fix this problem. The key idea is to keep ISL library always seeing the same single parameter for those memory accesses that use the same index variables even though they have different constant base values. For the example we have seen before: for(i = 0; i < 8; i++) { for (ctr = 0; ctr < 8; ctr++) { x1 = input[i*64 + ctr*8 + 1] ; x0 = input[i*64 + ctr*8 + 0] ; input[i*64 + ctr*8 + 0] = x0 - x1; input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; } Without this patch file, Polly would produce the Context as follows: Context: [p_0, p_1, p_2] -> { : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 and p_1 >= -9223372036854775808 and p_1 <= 9223372036854775807 and p_2 >= -9223372036854775808 and p_2 <= 9223372036854775807 } p0: {0,+,128}<%for.cond2.preheader> p1: {2,+,128}<%for.cond2.preheader> p2: {4,+,128}<%for.cond2.preheader> Statements { Stmt_for_body6 Domain := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; Scattering := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; ReadAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; ReadAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_2 + ! 16i0 }; } Furthermore, it will produce very complex RAW, WAW and WAR dependence as follows: RAW dependences: [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> Stmt_for_body6[o0] : (exists (e0 = [(p_1)/2], e1 = [(7p_1 + p_2 + 112i0)/16]: 16o0 = -p_0 + p_1 + 16i0 and 2e0 = p_1 and 16i0 >= -p_1 + p_2 and 16i0 <= 112 + p_0 - p_1 and p_1 >= 16 + p_0 and i0 >= 0 and 16e1 >= -15 + 7p_1 + p_2 + 112i0 and 16e1 <= -1 + 7p_1 + p_2 + 112i0 and p_2 >= 16 + p_0)) or ... or (exists (e0 = [(p_1)/2], e1 = [(-p_1 + p_2)/16]: 16o0 = p_0 - p_1 + 16i0 and 2e0 = p_1 and 16e1 = -p_1 + p_2 and 16i0 >= -p_0 + p_2 and 16i0 <= 112 - p_0 + p_1 and p_1 <= -16 + p_0 and p_2 >= 16 + p_0)) } It not only leads to significant compile-time overhead, but also produces incorrect dependence results. As we can see from the source code, there should be no RAW, WAW and WAR across loop iterations at all. With this patch file, Polly would produce the following Context and RAW, WAW, WAR dependence: Context: [p_0] -> { : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 } p0: {0,+,128}<%for.cond2.preheader> Statements { Stmt_for_body6 Domain := [p_0] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; Scattering := [p_0] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; ReadAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; ReadAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 }; } RAW dependences: [p_0] -> { } WAR dependences: [p_0] -> { } WAW dependences: [p_0] -> { } We can see it only creates a single parameter and it can catch the exact RAR, WAW and WAW dependence. To see whether our patch file could catch normal data dependence, I have constructed another example as follows: for(i = 0; i < 8; i++) { for (ctr = 0; ctr < 8; ctr++) { x0 = input[i*64 + ctr + 0] ; x1 = input[i*64 + ctr + 1] ; input[i*64 + ctr + 0] = x0 - x1; input[i*64 + ctr + 1] = x0 + x1; input[i*64 + ctr + 2] = x0 * x1; } The original Polly would produce similar complex results. However, with the attached patch file, Polly would produce the following Context and RAW, WAW, WAR dependence: Context: [p_0] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 } p0: {0,+,128}<%for.cond2.preheader> Statements { Stmt_for_body6 Domain := [p_0] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; Scattering := [p_0] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; ReadAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 2i0 }; ReadAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 2i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 }; MustWriteAccess := [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 2i0 }; } RAW dependences: [p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) } WAR dependences: [p_0] -> { } WAW dependences: [p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) Results show that it still creates a single parameter and catches the exact RAR, WAW and WAW dependence. Of course, this patch file makes the compiling very faster, i.e., compile-time is reduced from several minutes to less then 1 second. @Tobias: >This is obviously a hack. The base is not always a constant. >You can probably just call use something like, >isl_pw_aff *BaseValue = visit(AR->getOperand(0)) >Affine = isl_pw_aff_sum(Affine, BaseValue); Currently, I only handle constant base values because I have not found a good way to handle general base values. isl_pw_aff_add requires two isl_pw_aff parameters, but Affine is actually of isl_aff type. Perhaps we could first commit a patch file to handle common cases, then we can considering submitting another patch file to handle general cases. >I think this is the right idea, but probably the wrong place to put it. >I would put this into SCEVValidator::visitAddRecExpr. This function >always adds the AddRecExpr itself as a parameter, whenever it is found >to be parametric. However, what we should do is to create a new ScevExpr >that starts at zero and is otherwise identical. We then add this as a >parameter. When doing this, we now also need to keep all the parameters >that have been found previously in the base expression. A lot of Polly functions access ParameterIds and Parameters using existing ScevExpr. If we create new ScevExpr and put them into Parameters and ParameterIds, it may require some extra tricks to handle the mapping from existing ScevExpr to newly created ScevExpr. I think we can consider fixing it later. Cheers, Star Tan At 2013-07-30 00:27:27,"Tobias Grosser" wrote: >On 07/29/2013 09:15 AM, Sven Verdoolaege wrote: >> On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote: >>> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: >>>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >>>>> Sven: In terms of making the behaviour of isl easier to understand, >>>>> it may make sense to fail/assert in case operands have parameters that >>>>> are named identical, but that refer to different pointer values. >>>> >>>> No, you are allowed to have different identifiers with the same name. >>>> I could optionally print the pointer values, but then I'd have >>>> to think about what to do with them when reading a textual >>>> representation of a set with such pointer values in them. >>> >>> Yes, this is how it is today. >> >> No, the pointer values are currently not printed. > >I was referring to the first sentence. I do not think printing pointer >values is what we want. It would make the output unpredictable not only >when address space randomisation is involved. > >>> I wondered if there is actually a need to >>> allow the use of different identifiers with the same name (except all being >>> unnamed?). I personally do not see such a need and would prefer isl to >>> assert/fail in case someone tries to do so. This may avoid confusions as >>> happened here. Do you see a reason why isl should allow this? >> >> Removing this feature would break existing users. > >Even if it would, the benefits for future users may outweigh this. >Also, are you aware of a user that actually breaks? > >Anyway, on the Polly side we know the behaviour and can handle it. So >this is nothing I am very strong about. I just mentioned it as it seemed >to be a good idea. > >Cheers, >Tobias > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-ScopInfo-Create-shared-parameter-for-memory-accesses.patch Type: application/octet-stream Size: 6694 bytes Desc: not available URL: From clattner at apple.com Tue Jul 30 21:44:36 2013 From: clattner at apple.com (Chris Lattner) Date: Tue, 30 Jul 2013 21:44:36 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F7F587.30304@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> Message-ID: <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> On Jul 30, 2013, at 10:19 AM, Shuxin Yang wrote: > The pro for running LICM early is that it may move big redundant stuff out of loop nest. You never know > how big it is. In case you are lucky , you can move lot of stuff out of > loop, the loop may become much smaller and hence enable lots of downstream optimizations. This sound > to be a big win for control-intensive programs where Loop-nest-opt normally is a big, expensive no-op. > > The con side is that, as you said, the nest is not perfect any more. However, I would argue LNO optimizations > should be able to tackle the cases when imperfect part is simple enough (say, no call, no control etc). > (FYI, Open64's LNO is able to tackle imperfect nesting so long as imperfect part is simple). Or you just reverse > the LICM, that dosen't sound hard. FWIW, I completely agree with this. The canonical form should be that loop invariants are hoisted. Optimizations should not depend on perfect loops. This concept really only makes sense for Source/AST level transformations anyway, which don't apply at the LLVM IR level. -Chris From westdac at gmail.com Tue Jul 30 23:38:19 2013 From: westdac at gmail.com (Dan) Date: Wed, 31 Jul 2013 00:38:19 -0600 Subject: [LLVMdev] Is there a way to check if an operation's type has been promoted Message-ID: This is more of a follow-up on an earlier question. If MUL MVT::i32 is getting promoted (to MVT::i64), is there any way to distinguish the Node later as it goes through lowering? -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.p.northover at gmail.com Tue Jul 30 23:38:31 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 31 Jul 2013 07:38:31 +0100 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: References: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> Message-ID: Hi Dan, If you set the node's action to "Custom", you should be able to interfere in the type legalisation phase (before it gets promoted to a 64-bit MUL) by overriding the "ReplaceNodeResults" function. You could either expand it to a different libcall directly there, or replace it with a target-specific node (say XXXISD::MUL32) which claims to take i64 types but you really know is the 32-bit multiply. Then you'd have to take care of that node elsewhere, of course. Cheers. Tim. From rasha.sala7 at gmail.com Wed Jul 31 00:05:39 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Wed, 31 Jul 2013 09:05:39 +0200 Subject: [LLVMdev] Problem to remove successors Message-ID: Hi All, I need to remove successors from every basic block to insert new ones I tried this code, but it doesn't work void RemoveSuccessor(TerminatorInst *TI, unsigned SuccNum) { assert(SuccNum < TI->getNumSuccessors() && "Trying to remove a nonexistant successor!"); // If our old successor block contains any PHI nodes, remove the entry in the // PHI nodes that comes from this branch... // BasicBlock *BB = TI->getParent(); TI->getSuccessor(SuccNum)->removePredecessor(BB); TerminatorInst *NewTI = 0; switch (TI->getOpcode()) { case Instruction::Br: // If this is a conditional branch... convert to unconditional branch. if (TI->getNumSuccessors() == 2) { cast(TI)->setUnconditionalDest(TI->getSuccessor(1-SuccNum)); } else { // Otherwise convert to a return instruction... Value *RetVal = 0; // Create a value to return... if the function doesn't return null... if (!(BB->getParent()->getReturnType())->isVoidTy()) RetVal = Constant::getNullValue(BB->getParent()->getReturnType()); // Create the return... NewTI = 0; } break; case Instruction::Invoke: // Should convert to call case Instruction::Switch: // Should remove entry default: case Instruction::Ret: // Cannot happen, has no successors! assert(0 && "Unhandled terminator instruction type in RemoveSuccessor!"); abort(); } if (NewTI) // If it's a different instruction, replace. ReplaceInstWithInst(TI, NewTI); } Could you please help me to know where is the problem ? Thank you -- * Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University* * e-mail: rasha.omar at ejust.edu.eg* P* Please consider the environment before printing this email.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlmeetei at gmail.com Wed Jul 31 00:17:40 2013 From: dlmeetei at gmail.com (Devchandra L Meetei) Date: Wed, 31 Jul 2013 12:47:40 +0530 Subject: [LLVMdev] [cfe-dev] llvm.org bug trend In-Reply-To: References: Message-ID: You can get the list searching with the Bug Changes filter giving appropriate values in fields. and shared it. Other interested user can visit Preferences ->Saved Searches after logging into Bugzilla. As we say, Just created and shared this bug group under the *Last30DayIssues* in bugzilla. On Wed, Jul 31, 2013 at 12:57 AM, Sean Silva wrote: > It would be really nice if our bug tracker had something like the "Issues: > 30-day summary" that this jira instance has < > https://issues.apache.org/jira/browse/LUCENE>. > > Another interesting statistic that might be interesting to plot alongside > the "open bugs" graph is "total lines of code" (or maybe just plot the > ratio "bugs/LOC"). > > -- Sean Silva > > > On Tue, Jul 30, 2013 at 11:31 AM, Robinson, Paul < > Paul_Robinson at playstation.sony.com> wrote: > >> Over most of the past year, I have been keeping an eye on the overall >> LLVM.org open-bug count.**** >> >> Sampling the count (almost) every Monday morning, it is a consistently >> non-decreasing number.**** >> >> I thought I’d post something about it to the Dev lists, as the count >> broke 4000 this past week.**** >> >> For your entertainment here’s a chart that Excel produced from the data. >> (To make it more**** >> >> dramatic, I carefully did not use a proper zero point on the X-axis.)**** >> >> ** ** >> >> I do not have per-category breakdowns, sorry, just the raw total.**** >> >> ** ** >> >> Makes me think more seriously about cruising the bug list for something >> that looks like**** >> >> I could actually fix it…**** >> >> --paulr**** >> >> ** ** >> >> [image: cid:image001.png at 01CE8C56.9A40D7E0]**** >> >> ** ** >> >> ** ** >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >> >> > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > -- Warm Regards --Dev OpenPegasus Developer/Committer (\__/) (='.'=) This is Bunny. Copy and paste bunny (")_(") to help him gain world domination. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 11909 bytes Desc: not available URL: From cdavis5x at gmail.com Wed Jul 31 00:38:52 2013 From: cdavis5x at gmail.com (Charles Davis) Date: Wed, 31 Jul 2013 01:38:52 -0600 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> Message-ID: On Jul 30, 2013, at 12:41 PM, Stephen Lin wrote: >> Right. What's the point of all the effort devoted to MSVC++ ABI >> compatibility when Clang doesn't need it for being a top-notch C++ >> compiler on Windows? > > I brought up a similar point a little bit earlier, too.... > > It seems like the only necessary condition for being a first-class > native-code development tool on Windows is to support the platform C > ABI and the subset of the C++ ABI required by COM, and that is the > most that any non-MS compiler on Windows tries to do, Huh? Intel CC supports the MSVC++ ABI. Zortech (Digital Mars) supports it, too (though the guy who wrote it isn't too proud of that fact--or the fact that he even wrote that compiler to begin with). Heck, even CodeWarrior supported it (as Howard Hinnant might well remember), before Metrowerks sold off their x86 compiler backend. In fact, other than GCC, only Borland and Watcom don't support it, and that's only because they originally developed theirs during the DOS era, before MSVC++ became the de facto standard compiler. (But Watcom also used to support MFC development on Windows, so they might have supported it at one point, too.) > so I am > genuinely curious why there is so much effort being spent on the more > esoteric portions of the ABI? Because the sad truth is, the Microsoft ABI is the de facto standard on Windows--just as the Itanium ABI is the de facto standard on Linux and Mac OS. Many a third-party DLL was compiled against it, and so were a few first-party ones, like the aforementioned MFC. Not everyone (sadly) has the luxury of recompiling their DLLs to work with an alternative C++ ABI--and I'd imagine it'd be a support nightmare for those that do to have multiple binaries out there in the wild (Welcome to DLL Hell!! :). Seriously, I think they've had enough trouble with multiple versions; I don't think they want to deal with multiple C++ ABIs, too. (Naturally, open source software does not have the first problem, because they *can* just recompile their DLLs to work with GCC/Clang--which they often do. But still, there's a lot of legacy proprietary code out there on Windows, and the fact that open source code can be recompiled does little for the DLL Hell problem.) The fact that more than one C++ ABI isn't quite compatible with MSVC (not even enough to support COM, in some cases) is the reason Wine (for example) doesn't allow C++ code. (And yes, some of their DLLs really do need to conform to the Microsoft C++ ABI, beyond just COM.) In fact, that was the whole reason that *I* started this little subproject (supporting the MSVC ABI) in the first place. (Though I imagine that at least some of this might have taken place even without me. ;) You're right that we don't need it to be a great C++ compiler. But we do need it to attract people who'd otherwise use MSVC (if for no other reason than ABI compatibility) to our top-notch compiler. Chip From baldrick at free.fr Wed Jul 31 01:41:07 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 31 Jul 2013 10:41:07 +0200 Subject: [LLVMdev] Strange crash with LLVM 3.3 In-Reply-To: References: Message-ID: <51F8CDA3.40101@free.fr> Hi, On 30/07/13 19:56, Stéphane Letz wrote: > Hi, > > We are embedding our DSL language + LLVM in a modified WebKit based Safari on OSX. Starting with LLVM 3.3 (it was working with LLVM 3.1...) we see the following crash: > > Any idea? > > Thanks. > > Stéphane Letz > > ====================== > > Process: SafariForWebKitDevelopment [79228] > Path: /Applications/Safari.app/Contents/MacOS/SafariForWebKitDevelopment > Identifier: SafariForWebKitDevelopment > Version: 7536.30.1 > Code Type: X86-64 (Native) > Parent Process: perl5.12 [79212] > User ID: 501 > > Date/Time: 2013-07-30 19:16:36.081 +0200 > OS Version: Mac OS X 10.8.4 (12E55) > Report Version: 10 > Sleep/Wake UUID: 76D14C5C-4635-4D22-83AD-A997908C17BB > > Crashed Thread: 0 Dispatch queue: com.apple.main-thread > > Exception Type: EXC_CRASH (SIGABRT) > Exception Codes: 0x0000000000000000, 0x0000000000000000 > > Application Specific Information: > /opt/local/libexec/llvm-3.3/lib/libLLVM-3.3.dylib > *** error for object 0x7fff5a075bc0: pointer being freed was not allocated try running under valgrind. Ciao, Duncan. From westdac at gmail.com Wed Jul 31 02:03:50 2013 From: westdac at gmail.com (Dan) Date: Wed, 31 Jul 2013 03:03:50 -0600 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: References: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> Message-ID: Thanks Tom. I really appreciate your insight. I'm able to use the customize to get the 64-bit to go to a subroutine and for the 32-bit, I am generate XXXISD::MUL32. I'm not sure then what you mean about "overriding" the ReplaceNodeResults. For ReplaceNodeResults, I'm doing: SDValue Res = LowerOperation(SDValue(N, 0), DAG); for (unsigned I = 0, E = Res->getNumValues(); I != E; ++I) Results.push_back(Res.getValue(I)); I did have to put in the following as well: SDValue LHS = Op.getOperand(0); SDValue RHS = Op.getOperand(1); LHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, LHS); RHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, RHS); return DAG.getNode(XXXISD::MUL32, Op->getDebugLoc(), MVT::i64, LHS, RHS); In order to get the operation to be able to be able to go forward and match the new operation with the input operands (which were still I32 and not yet type-legalized to i64). Does this make sense to you? Here's what I am using to generate the XXXISD::MUL32: if(OpVT != MVT::i64) { //Op.getNode()->dumpr(); SDValue LHS = Op.getOperand(0); SDValue RHS = Op.getOperand(1); LHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, LHS); RHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, RHS); return DAG.getNode(XXXISD::MUL32, Op->getDebugLoc(), MVT::i64, LHS, RHS); } Not sure if the above is correct? It is then running into a problem with the next ADD instruction not being able to PromoteIntRes_SimpleIntBinOp, it tries to check the XXXISD::MUL32 (gdb) p N->dumpr() 0x23ad0d0: i32 = add [ID=0] 0x23aff60, 0x23acfd0 0x23aff60: i64 = <> [ID=-3] 0x23b0260, 0x23b0160 0x23b0260: i64 = and [ID=-3] 0x23af660, 0x23b0060: i64 = Constant<4294967295> [ID=-3] 0x23af660: i64,ch = load [ID=-3] 0x238b068: ch = EntryToken [ID=-3], 0x23ac7d0: i64 = GlobalAddr\ ess 0 [ID=-3], 0x23ac9d0: i64 = undef [ID=-3] 0x23b0160: i64 = and [ID=-3] 0x23afc60, 0x23b0060: i64 = Constant<4294967295> [ID=-3] 0x23afc60: i64,ch = load [ID=-3] 0x238b068: ch = EntryToken [ID=-3], 0x23acbd0: i64 = GlobalAddr\ ess 0 [ID=-3], 0x23ac9d0: i64 = undef [ID=-3] 0x23acfd0: i32,ch = load [ID=-3] 0x238b068: ch = EntryToken [ID=-3], 0x23aced0: i64 = GlobalAddress 0 [ID=-3\ ], 0x23ac9d0: i64 = undef [ID=-3] When you say that I'll have to take care of the node elsewhere, does that mean in defining it as a proper way to lower? Like below? I found that if I don't then put the XXXISD::MUL32 in the LowerOperation, then after it is created doing the custom change of MUL, that it just dies not knowing how to lower the machine op. I would have thought that there was a default path for any XXXISD operation? And I didn't see other Targets generating their machine ops SDValue XXXTargetLowering:: LowerOperation(SDValue Op, SelectionDAG &DAG) const { case XXXISD::MUL32: return SDValue(); Really appreciate your help and any other pointers. Dan On Wed, Jul 31, 2013 at 12:38 AM, Tim Northover wrote: > Hi Dan, > > If you set the node's action to "Custom", you should be able to > interfere in the type legalisation phase (before it gets promoted to a > 64-bit MUL) by overriding the "ReplaceNodeResults" function. > > You could either expand it to a different libcall directly there, or > replace it with a target-specific node (say XXXISD::MUL32) which > claims to take i64 types but you really know is the 32-bit multiply. > Then you'd have to take care of that node elsewhere, of course. > > Cheers. > > Tim. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From baldrick at free.fr Wed Jul 31 02:33:45 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 31 Jul 2013 11:33:45 +0200 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: References: Message-ID: <51F8D9F9.6010409@free.fr> Hi Dan, On 30/07/13 21:14, Dan wrote: > > I'll try to run through the scenario: > > > 64-bit register type target (all registers have 64 bits). > > all 32-bits are getting promoted to 64-bit integers > > Problem: > > MUL on i32 is getting promoted to MUL on i64 > > MUL on i64 is getting expanded to a library call in compiler-rt > > > the problem is that MUL32 gets promoted and then converted into a subroutine > call because it is now type i64, even though I want the MUL I32 to remain as an > operation in the architecture. MUL i32 would generate a 64-bit results from the > lower 32-bit portions of 64-bit source operands. I think you should register custom type promotion logic, see LegalizeIntegerTypes.cpp, line 40. When this gets passed a 32 bit multiplication, it should promote it to a 64 bit operation using the target specific node that does your special multiplication. Ciao, Duncan. > > In customize for the operations, I am trying to do something like: > > case ISD::MUL: > { > EVT OpVT = Op.getValueType(); > if (OpVT == MVT::i64) { > RTLIB::Libcall LC = RTLIB::MUL_I64; > SDValue Dummy; > return ExpandLibCall(LC, Op, DAG, false, Dummy, *this); > } > else if (OpVT == MVT::i32){ > > ??? What to do here to not have issues with type i32 > } > } > > > I've gone a few directions on this. > > Defining the architecture type i32 leads to a lot of changes that I don't think > is the most straightforward change. > > Would think there is a way to promote the MUL i32 types but still be able to > "see" that as a MUL i32 somewhere down the lowering process. > > Are there suggestions on how to promote the type, but then be able to customize > the original i64 to a call and the original mul i32 to an operation? > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From baldrick at free.fr Wed Jul 31 02:36:23 2013 From: baldrick at free.fr (Duncan Sands) Date: Wed, 31 Jul 2013 11:36:23 +0200 Subject: [LLVMdev] Is there a way to check if an operation's type has been promoted In-Reply-To: References: Message-ID: <51F8DA97.5080002@free.fr> On 31/07/13 08:38, Dan wrote: > > > This is more of a follow-up on an earlier question. If MUL MVT::i32 is getting > promoted (to MVT::i64), is there any way to distinguish the Node later as it > goes through lowering? No. Ciao, Duncan. From michael.m.kuperstein at intel.com Wed Jul 31 02:50:07 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Wed, 31 Jul 2013 09:50:07 +0000 Subject: [LLVMdev] [Proposal] Speculative execution of function calls Message-ID: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> Hello, Chris requested I start a fresh discussion on this, so, here goes. The previous iterations can be found here (and in follow-ups): http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130722/182590.html http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064047.html Cutting to the chase, the goal is to enhance llvm::isSafeToSpeculativelyExecute() to support call instructions. isSafeToSpeculativelyExecute() is a query that, basically, determines whether it is safe to move an instruction that is executed conditionally into an unconditional context. One common use-case is hoisting loop-invariant instructions out of loops, during loop-invariant code motion. For example: int foo(int a, int b, int n) { int sum = 0; for(int i = 0; i < n; ++i) { int temp = a + b; sum += temp; } } Can be transformed into: int foo(int a, int b, int n) { int sum = 0; int temp = a + b; for(int i = 0; i < n; ++i) { sum += temp; } } Because hoisting the addition is safe. However, code that looks like this is more problematic: int bar(int a); int foo(int n) { int sum = 0; for(int i = 0; i < n; ++i) { int temp = bar(n); sum += temp; } } May not, in general, be transformed into int foo_bad(int n) { int sum = 0; int temp = bar(n); for(int i = 0; i < n; ++i) { sum += temp; } } The first issue is that bar() may have side effects, in which case this transformation is clearly unsafe. Unfortunately, even if bar() is marked "readnone, nounwind", this is still not a safe transformation. The problem is that the loop is not guaranteed to have even a single iteration, and even readnone functions may not always be safe to call. So, if bar() is defined like this: int bar(int a) { while(a != 0) {} return a; } Then foo(0) is safe, but foo_bad(0) is an infinite loop. Similarly, if bar() is defined as: int bar(int a) { return 1000 / a; } Then foo(0) is safe, but foo_bad(0) has undefined behavior. Unfortunately, right now, there is no way to specify that a function call IS safe to execute under any circumstances. Because of this, llvm::isSafeToSpeculativelyExecute() simply returns false for all Call instructions, except calls to some intrinsics which are special-cased, and are hard-coded into the function. What I would like to see instead is a function attribute - or a set of function attributes - that would allow isSafeToSpeculativelyExecute() to infer that it may return "true" for a given function call. This has two main uses: 1) Intrinsics, including target-dependent intrinsics, can be marked with this attribute - hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by "readnone" (e.g. rounding mode) will also not have any of the other issues. 2) DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe. (The optimizer marking user functions as safe seems, to me, like a less likely case) I see two ways to approach this: a) Define a new attribute that says, specifically, that a function is safe to execute speculatively ("speculatable"). b) Try to define a set of orthogonal attributes that, when all of them are specified, ensure speculative execution is safe, and then add the missing ones. Option (b) sounds better in theory, but I find it problematic for two reasons - it's not clear both what the precise requirements for safety are (right now, "I know it when I see it", and I'm not sure I want to set it in stone), and what the granularity of these orthogonal attributes should be. For example, {readnone, nounwind, halting, welldefined} sounds like a good start, but I'm not sure whether "welldefined" is not too much of a catch-all, or whether this set is, in fact, exhaustive. So I'm more inclined towards (a). I'm attaching a patch that implements option (a) (the same patch from llvm-commits), but feel free to tell me it's rubbish. :-) Thanks, Michael --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: speculatable.diff Type: application/octet-stream Size: 23140 bytes Desc: speculatable.diff URL: From richard at xmos.com Wed Jul 31 03:38:10 2013 From: richard at xmos.com (Richard Osborne) Date: Wed, 31 Jul 2013 11:38:10 +0100 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F1664F.1040003@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> Message-ID: <51F8E912.4080600@xmos.com> On 25/07/13 18:54, Shankar Easwaran wrote: > Hi, > > Currently lld ties up all atoms in a section for ELF together. This > proposal just breaks it by handling it differently. > > *This requires **NO ELF ABI changes. > > *_*Definitions :-*_ > > A section is not considered safe if there is some code that appears to > be present between function boundaries (or) optimizes sections to > place data at the end or beginning of a section (that contains no symbol). > > A section is considered safe if symbols contained within the section > have been associated with their appropriate sizes and there is no data > present between function boundaries. I'd like to see a more precise definition of "safe". For example just from the above description it is not clear that "safe" disallows one function falling through into another, but based on the intended use cases this clearly isn't allowed. How is alignment handled? If I have two functions in the same section with different .align directives will these be respected when the section is split apart? Is it OK for a loop within a function to have a .align? What about relocations? If calls are implemented with branches taking pc-relative offsets then the assembler might patch in the branch offset and not emit a relocation. This clearly prevents functions from being removed / reordered, so I assume it is a requirement that a safe section always uses relocations for branches between functions and if it has a choice of long or short branches it aways conservatively uses a long branch. This should be made explicit in the description of safe. If you have a symbol at the same address as a function how do you decide if it should be associated with this function or the end of the last function? Is it a requirement that there are no references to symbols defined inside the function except for the function symbol itself? If so how does this work when you have debug info (which might have references to addresses within the function)? -- Richard Osborne | XMOS http://www.xmos.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ofv at wanadoo.es Wed Jul 31 03:40:29 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Wed, 31 Jul 2013 12:40:29 +0200 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> Message-ID: <87y58ngggy.fsf@wanadoo.es> Charles Davis writes: > Huh? Intel CC supports the MSVC++ ABI. Zortech (Digital Mars) supports > it, too (though the guy who wrote it isn't too proud of that fact--or > the fact that he even wrote that compiler to begin with). Heck, even > CodeWarrior supported it (as Howard Hinnant might well remember), > before Metrowerks sold off their x86 compiler backend. In fact, other > than GCC, only Borland and Watcom don't support it, and that's only > because they originally developed theirs during the DOS era, before > MSVC++ became the de facto standard compiler. (But Watcom also used to > support MFC development on Windows, so they might have supported it at > one point, too.) MFC is just a library and hence unrelated to the C++ ABI. Borland produced MFC applications without problem. You need to support some language extensions, but that's all. Oh, and IIRC you need a license for using MFC with other compiler than MS. If I'm not mistaken, same requirement applies to key components such as the C++ runtime/stdlib. >> so I am >> genuinely curious why there is so much effort being spent on the more >> esoteric portions of the ABI? > Because the sad truth is, the Microsoft ABI is the de facto standard > on Windows--just as the Itanium ABI is the de facto standard on Linux > and Mac OS. Many a third-party DLL was compiled against it, and so > were a few first-party ones, like the aforementioned MFC. Not everyone > (sadly) has the luxury of recompiling their DLLs to work with an > alternative C++ ABI--and I'd imagine it'd be a support nightmare for > those that do to have multiple binaries out there in the wild (Welcome > to DLL Hell!! :). Seriously, I think they've had enough trouble with > multiple versions; I don't think they want to deal with multiple C++ > ABIs, too. Quite the contrary, knowing that Clang's C++ ABI is completely incompatible with MS is a maintenance *simplification*. Saying that supporting the MS C++ ABI is an uphill battle is an understatement (and better say MS C++ ABI*S*, because it evolved over time and it is known that it will change on future releases.) As far as I'm concerned, I'll never base my decisions on compiler usage on the advertisement of MS compatibility by Clang++, becase I *know* that for a very long time (maybe forever) whatever MS C++ libraries that work with Clang++ is by luck. That's what happens when you try to implement compatibility with an undocumented, propietary, complex feature. > (Naturally, open source software does not have the first problem, > because they *can* just recompile their DLLs to work with > GCC/Clang--which they often do. But still, th ere's a lot of legacy > proprietary code out there on Windows, and the fact that open source > code can be recompiled does little for the DLL Hell problem.) > > The fact that more than one C++ ABI isn't quite compatible with MSVC > (not even enough to support COM, in some cases) is the reason Wine > (for example) doesn't allow C++ code. (And yes, some of their DLLs > really do need to conform to the Microsoft C++ ABI, beyond just COM.) > In fact, that was the whole reason that *I* started this little > subproject (supporting the MSVC ABI) in the first place. (Though I > imagine that at least some of this might have taken place even without > me. ;) > > You're right that we don't need it to be a great C++ compiler. But we > do need it to attract people who'd otherwise use MSVC (if for no other > reason than ABI compatibility) to our top-notch compiler. Of course you are free to work on whatever you wish. I'm not criticizing your work or anybody else's. However, I'm quite surprised to see how a great deal of energy is invested on MS C++ ABI compatibility without an end in sight and with some ticklish areas ahead (MS C++ runtime(s) support, SEH support, complex LLVM change requirements (this very same thread)) while Clang++ currently cannot do several basic things on Windows (which of course are also required for being MS compatible.) Example: dllexport of C++ classes and templates. I'm also worried by the damage of claims such as "MS C++ ABI is the standard on Windows", which conveys the message that until Clang++ supports such ABI it can not be a serious contender on Windows. Which is blatantly false. Fact is that you need MS C++ ABI compatibility only for using C++ libraries which lack source code. I doubt that the individuals and organizations that abide such pitiful state are interested on using a compiler other than the "industry standard" MSVC++, or that the success of Clang on Windows depend on them at all. From chandlerc at google.com Wed Jul 31 03:48:34 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 03:48:34 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F881DE.60809@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> <51F881DE.60809@codeaurora.org> Message-ID: On Tue, Jul 30, 2013 at 8:17 PM, Shankar Easwaran wrote: > On 7/30/2013 7:41 PM, Chandler Carruth wrote: > >> I've not been following this thread at all. However, skimming the original >> post, I fail to find a nice summary of what problem is trying to be >> solved. >> > The proposal is trying to garbage collect symbols that are not referenced > during the link step. In addition this proposal can be extended to keep > frequently called functions/data closer to increase cache coherency. > > 'lld' tries to atomize all the symbols in a section, but the problem it > faces on ELF is that, lld needs to tie all the atoms in a section together > to respect the ELF section property. This may be relaxed if lld knows at > the time of parsing the section contents, that, if you really need to tie > all the atoms together in the final ELF image. > > This proposal is just a way to convey information from the object file to > the linker that sections in the object file can be safely converted to > atoms, and they need not be tied together in whichever section they reside. OK, thanks. > > >> By reading the rest of the thread I divine that the goal is faster links >> and better dead code stripping? >> >> Making that clearer would help. Naming your sections something other than >> "safe" (which has *very* different connotations) would help more. >> > "safe" is a property of a section, by which a section can be atomized and > the atoms can appear anywhere in the final output file. I don't think that this very narrow interpretation of "safe" is going to shared by anyone who hasn't read this email from you. ;] It would need a better name, but I'm not sure we need it at all, see below. > > >> However, I question several fundamental assumptions: >> 1) We should be very certain that -ffunction-sections is not a viable >> solution as it exists and is well supported in other toolchains and >> environments. >> > -ffunction-sections and -fdata-sections would work, but that would require > all third party libraries etc to make sure that they are all compiled with > -ffunction-sections. Your proposal would require compiling all third party libraries with a compiler that supports safe sections. The goal you have is *exactly* the goal of -ffunction-sections. We don't need another mechanism to solve this problem based on the criteria you have outlined here. Your proposal is also exactly the same complexity of deployment or implementation as -ffunction-sections in that it changes the fundamental output of the toolchain to make evident the separability of the units of the code. Now, if there is a technical problem with putting functions in their own ELF sections, let's talk about that. But so far, I don't see anything even remotely compelling enough to talk about a special semantic change to certain sections. 2) We should talk to other ELF producers and coordinate to make sure we >> don't end up creating a twisty maze of extensions here. >> > This is not a problem with the general ELF community since the binutils > ld/gold would not atomize sections. Just because this won't actively explode with other ELF tools doesn't mean we shouldn't try to reach consensus throughout the larger community before changing the way in which Clang writes ELF files and LLD reads them. We really do need to maintain interoperability (by and large) with other toolchains on the same platform. > > 3) We should step back and consider leapfrogging to a fully specialized >> format to reap even more performance benefits rather than messily patching >> ELF. >> > I dont think we should come up with another object file format. OK, there are others who disagree though. =] It is at least something that we shouldn't write off and should consider *IF* we're going to also consider the rest of the proposal. But currently, I think this entire thing already works with -ffunction-sections until there is a concrete description of why that mode simply won't work. > > > Shankar Easwaran > -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Chisnall at cl.cam.ac.uk Wed Jul 31 03:56:11 2013 From: David.Chisnall at cl.cam.ac.uk (David Chisnall) Date: Wed, 31 Jul 2013 11:56:11 +0100 Subject: [LLVMdev] [Proposal] Speculative execution of function calls In-Reply-To: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> References: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> Message-ID: <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" wrote: > This has two main uses: > 1) Intrinsics, including target-dependent intrinsics, can be marked with this attribute – hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by “readnone” (e.g. rounding mode) will also not have any of the other issues. > 2) DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe. The slightly orthogonal question to safety is the cost of execution. For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors). For other functions, it's even worse because the cost may be dependent on the input. Consider as a trivial example the well-loved recursive Fibonacci function. This is always safe to call speculatively, because it only touches local variables. It is, however, probably never a good idea to do so. It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution. Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs. Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this. I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations. This means that optimisations have to be aware that some calls are not really calls and so don't cause any flow control effects. I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent instructions. This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions. A final issue that may be relevant is parallel safety. On architectures that have very cheap userspace coroutine creation, it may be interesting to speculatively execute some functions in parallel. On others, I can imagine transforming certain longer-running calls into libdispatch invocations followed by joins. This, however, requires that you can detect that the call is safe to execute speculatively, doesn't have read dependencies on any shared state that might be modified, and is sufficiently expensive for the overhead of parallel execution to be worth it. This is probably a lot beyond the scope of the current discussion. David From t.p.northover at gmail.com Wed Jul 31 03:56:27 2013 From: t.p.northover at gmail.com (Tim Northover) Date: Wed, 31 Jul 2013 11:56:27 +0100 Subject: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64 In-Reply-To: References: <20130730195519.GE1749@L7-CNU1252LKR-172027226155.amd.com> Message-ID: >From Duncan: > I think you should register custom type promotion logic, see > LegalizeIntegerTypes.cpp, line 40. When this gets passed a 32 > bit multiplication, it should promote it to a 64 bit operation > using the target specific node that does your special multiplication. I think that's what he's doing. >From Dan: > SDValue LHS = Op.getOperand(0); > SDValue RHS = Op.getOperand(1); > LHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, LHS); > RHS = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, RHS); > return DAG.getNode(XXXISD::MUL32, Op->getDebugLoc(), MVT::i64, > LHS, RHS); I think you should return an ISD::TRUNCATE of that MUL32. The truncate is only temporary and will be removed when the ADD you refer to later gets promoted, but it keeps the types correct in the interim (you don't have an "i32 add" of an i64 and an i32 and allows the legalizer to register your MUL32 as the promoted value. > When you say that I'll have to take care of the node elsewhere, does that > mean in defining it as a proper way to lower? Like below? Either lower it as you're talking about or select it from your InstrInfo.td if there's an actual instruction that will do the work. > I would have thought that there was a default path for any XXXISD operation? There's no default path for target-specific nodes. LLVM can't possibly know what they're supposed to be. Cheers. Tim. From david.tweed at arm.com Wed Jul 31 04:36:54 2013 From: david.tweed at arm.com (David Tweed) Date: Wed, 31 Jul 2013 12:36:54 +0100 Subject: [LLVMdev] [Proposal] Speculative execution of function calls In-Reply-To: <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> References: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> Message-ID: <000001ce8de2$49f129b0$ddd37d10$@tweed@arm.com> Hi, | I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations. This means that optimisations have to be aware that | some calls are not really calls and so don't cause any flow control effects. I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent | instructions. This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions. The only thing I'd say is that I think it's a mistake to try to separate out real "intrinsics" and "function calls" that should be speculatable when we're at thte level of platform independent optimizations (afterwards it may make more sense). Depending how exotic your hardware is there may be a lot of things that are implemented in a speculation-safe way that you'd like to represent (at the mid-level) as calls rather than as explicit LLVM IR intrinsics. For example, I expect which OpenCL "built-in functions" are functions and which are specialised in hardware varies significantly from device to device. Having different paths for "speculating" intrinsic and "functions which may or may not (depending on the back-end) be an intrinsic" seems to have a lot of potential for very algorithm duplication that's prone to drifting out of sync. Cheers, Dave From mkh159 at gmail.com Wed Jul 31 04:47:41 2013 From: mkh159 at gmail.com (m kh) Date: Wed, 31 Jul 2013 16:17:41 +0430 Subject: [LLVMdev] error on compiling vmkit Message-ID: Hi all, After "BUILD SUCCESSFUL" Message on compiling VMkit. after the make starts compiling mmtk-vmkit.jar In verbose mode the following command errors out: /home/usr/vmkit/Release+Asserts/bin/vmjc -load=/home/usr/vmkit/Release+Asserts/lib/MMTKRuntime.so -load=/home/usr/vmkit/Release+Asserts/lib/MMTKMagic.so -LowerMagic /home/usr/vmkit/mmtk/java/Release+Asserts/mmtk-vmkit.jar -disable-exceptions -disable-cooperativegc -with-clinit=org/mmtk/vm/VM,org/mmtk/utility/*,org/mmtk/policy/*,org/j3/config/* -Dmmtk.hostjvm=org.j3.mmtk.Factory -o /home/usr/vmkit/mmtk/java/Release+Asserts/mmtk-vmkit-lower.bc -Dmmtk.properties=/home/usr/vmkit/mmtk/java/vmkit.properties -disable-stubs -assume-compiled The Error: BUILD SUCCESSFUL Total time: 4 seconds [vmkit ./mmtk/java]: Compiling 'mmtk-vmkit.jar' Failed to load /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/libjava.so, cannot proceed: libjvm.so: cannot open shared object file: No such file or directory 0 vmjc 0x0000000001056e72 llvm::sys::PrintStackTrace(_IO_FILE*) + 34 1 vmjc 0x0000000001056ac9 2 libpthread.so.0 0x00002b8a5b47f030 3 libc.so.6 0x00002b8a5c26b475 gsignal + 53 4 libc.so.6 0x00002b8a5c26e6f0 abort + 384 5 vmjc 0x00000000005fdd7b 6 vmjc 0x00000000005f3f82 j3::Jnjvm::loadBootstrap() + 626 7 vmjc 0x000000000058df80 mainCompilerStart(j3::JavaThread*) + 1376 8 vmjc 0x0000000000618828 vmkit::Thread::internalThreadStart(vmkit::Thread*) + 680 9 libpthread.so.0 0x00002b8a5b476b50 10 libc.so.6 0x00002b8a5c313a7d clone + 109 Aborted java -version: java version "1.6.0_27" OpenJDK Runtime Environment (IcedTea6 1.12.5) (6b27-1.12.5-1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) Best regards, Mkh From h.bakiras at gmail.com Wed Jul 31 04:58:40 2013 From: h.bakiras at gmail.com (Harris BAKIRAS) Date: Wed, 31 Jul 2013 13:58:40 +0200 Subject: [LLVMdev] error on compiling vmkit In-Reply-To: References: Message-ID: <51F8FBF0.3000702@gmail.com> Hi, You just need to add the directory containing libjvm.so in your LD_LIBRARY_PATH environment variable. By default if you installed package openjdk-6-jdk, for 64 bits architecture path is : /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server/ Regards, Harris Bakiras On 07/31/2013 01:47 PM, m kh wrote: > Hi all, > > After "BUILD SUCCESSFUL" Message on compiling VMkit. > > after the make starts compiling mmtk-vmkit.jar > In verbose mode the following command errors out: > /home/usr/vmkit/Release+Asserts/bin/vmjc > -load=/home/usr/vmkit/Release+Asserts/lib/MMTKRuntime.so > -load=/home/usr/vmkit/Release+Asserts/lib/MMTKMagic.so -LowerMagic > /home/usr/vmkit/mmtk/java/Release+Asserts/mmtk-vmkit.jar > -disable-exceptions -disable-cooperativegc > -with-clinit=org/mmtk/vm/VM,org/mmtk/utility/*,org/mmtk/policy/*,org/j3/config/* > -Dmmtk.hostjvm=org.j3.mmtk.Factory -o > /home/usr/vmkit/mmtk/java/Release+Asserts/mmtk-vmkit-lower.bc > -Dmmtk.properties=/home/usr/vmkit/mmtk/java/vmkit.properties > -disable-stubs -assume-compiled > > The Error: > > BUILD SUCCESSFUL > Total time: 4 seconds > [vmkit ./mmtk/java]: Compiling 'mmtk-vmkit.jar' > Failed to load /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/libjava.so, > cannot proceed: > libjvm.so: cannot open shared object file: No such file or directory > 0 vmjc 0x0000000001056e72 llvm::sys::PrintStackTrace(_IO_FILE*) + 34 > 1 vmjc 0x0000000001056ac9 > 2 libpthread.so.0 0x00002b8a5b47f030 > 3 libc.so.6 0x00002b8a5c26b475 gsignal + 53 > 4 libc.so.6 0x00002b8a5c26e6f0 abort + 384 > 5 vmjc 0x00000000005fdd7b > 6 vmjc 0x00000000005f3f82 j3::Jnjvm::loadBootstrap() + 626 > 7 vmjc 0x000000000058df80 mainCompilerStart(j3::JavaThread*) + 1376 > 8 vmjc 0x0000000000618828 > vmkit::Thread::internalThreadStart(vmkit::Thread*) + 680 > 9 libpthread.so.0 0x00002b8a5b476b50 > 10 libc.so.6 0x00002b8a5c313a7d clone + 109 > Aborted > > > java -version: > java version "1.6.0_27" > OpenJDK Runtime Environment (IcedTea6 1.12.5) (6b27-1.12.5-1) > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > > Best regards, > Mkh > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From listiges at arcor.de Wed Jul 31 05:16:26 2013 From: listiges at arcor.de (Nico) Date: Wed, 31 Jul 2013 14:16:26 +0200 Subject: [LLVMdev] MachineBasicBlocks Cloning Message-ID: <31038302-43CA-4E8F-B9FB-B510E4BC3CE8@arcor.de> Hi, For some schedulers like Trace Scheduling it is necessary to clone basic blocks. Instinctively I would think the "Machine Instruction Scheduler" would be the right place to do so. Is it possible to clone MachineBasicBlocks in the "Machine Instruction Scheduler" pass? Any snares? Or is it to much effort to implement it there and there is a better place for such things? Thank you, Nico From michael.m.kuperstein at intel.com Wed Jul 31 05:18:05 2013 From: michael.m.kuperstein at intel.com (Kuperstein, Michael M) Date: Wed, 31 Jul 2013 12:18:05 +0000 Subject: [LLVMdev] [Proposal] Speculative execution of function calls In-Reply-To: <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> References: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> Message-ID: <251BD6D4E6A77E4586B482B33960D2283362087F@HASMSX106.ger.corp.intel.com> Whether cost is an issue depends on the specific use of speculative execution. In the context of LICM, I believe it is almost always a good idea to hoist, as loop counts of 0 are relatively rare. This applies especially to expensive functions. As to the use of speculative execution purely to elide jumps - right now, cost is not a factor in the isSafeTo...() decision in any case. A memory load may also be much more expensive than a jump, but loads, when possible, are still considered safe. So, I think this is indeed orthogonal - cost should be a separate query, perhaps. Some passes may want to perform it (In fact, SimplifyCFG already has an internal ComputeSpeculationCost() method), while others will want to speculate whenever possible (LICM). As to being safe for only a subset of inputs - if a function is safe only for a subset of inputs, it's not safe, just like a function that is readonly for a subset of inputs is not readonly. ;-) -----Original Message----- From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall Sent: Wednesday, July 31, 2013 13:56 To: Kuperstein, Michael M Cc: LLVMdev at cs.uiuc.edu Subject: Re: [LLVMdev] [Proposal] Speculative execution of function calls On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" wrote: > This has two main uses: > 1) Intrinsics, including target-dependent intrinsics, can be marked with this attribute - hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by "readnone" (e.g. rounding mode) will also not have any of the other issues. > 2) DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe. The slightly orthogonal question to safety is the cost of execution. For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors). For other functions, it's even worse because the cost may be dependent on the input. Consider as a trivial example the well-loved recursive Fibonacci function. This is always safe to call speculatively, because it only touches local variables. It is, however, probably never a good idea to do so. It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution. Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs. Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this. I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations. This means that optimisations have to be aware that some calls are not really calls and so don't cause any flow control effects. I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent instructions. This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions. A final issue that may be relevant is parallel safety. On architectures that have very cheap userspace coroutine creation, it may be interesting to speculatively execute some functions in parallel. On others, I can imagine transforming certain longer-running calls into libdispatch invocations followed by joins. This, however, requires that you can detect that the call is safe to execute speculatively, doesn't have read dependencies on any shared state that might be modified, and is sufficiently expensive for the overhead of parallel execution to be worth it. This is probably a lot beyond the scope of the current discussion. David --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From rasha.sala7 at gmail.com Wed Jul 31 05:20:52 2013 From: rasha.sala7 at gmail.com (Rasha Omar) Date: Wed, 31 Jul 2013 14:20:52 +0200 Subject: [LLVMdev] Instruction insertion By Module Pass In-Reply-To: <51F7C731.1040404@illinois.edu> References: <51F7C731.1040404@illinois.edu> Message-ID: Thank you for your help I tried Instruction* p=&( Bb->front()); Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); AllocaInst* newInst = new AllocaInst(Int32Type,"flag", p); that works well but I need to store the value of the variable too. What's the method that could be used to store specific value?? On 30 July 2013 16:01, John Criswell wrote: > On 7/30/13 7:44 AM, Rasha Omar wrote: > > Hi, > I need to insert new instruction into every basic block like x=1 > or while loop > I tried this code, but it doesn't work > > Type * Int32Type = IntegerType::getInt32Ty(getGlobalContext()); > AllocaInst* newInst = new AllocaInst(Int32Type,"flag", Bb); > Bb->getInstList().push_back(newInst); > > > The problem is that you've inserted the AllocaInst into the basic block > via the AllocaInst constructor (note the Bb at the end of the line with new > AllocaInst). You then attempt to insert the AllocaInst into the BasicBlock > Bb a second time with the last line. Note that the assertion is telling > you that you're inserting the alloca instruction twice. > > Remove the last line, and it should fix your problem. > > -- John T. > > > the error: > void llvm::SymbolTableListTraits llvm::BasicBlock>::addNodeToList(ValueSubClass *) [ValueSubClass = > llvm::Instruction, ItemParentClass = llvm::BasicBlock]: Assertion > `V->getParent() == 0 && "Value already in a container!!"' failed. > > Is there a class I could use to insert while loop in Module Pass? > > Thank you in advance > > -- > * Rasha Salah Omar > Msc Student at E-JUST > Demonestrator at Faculty of Computers and Informatics > Benha University* > > * e-mail: rasha.omar at ejust.edu.eg* > P* Please consider the environment before printing this email.* > > > > _______________________________________________ > LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -- * Rasha Salah Omar Msc Student at E-JUST Demonestrator at Faculty of Computers and Informatics Benha University* * e-mail: rasha.omar at ejust.edu.eg* P* Please consider the environment before printing this email.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.golin at linaro.org Wed Jul 31 05:32:32 2013 From: renato.golin at linaro.org (Renato Golin) Date: Wed, 31 Jul 2013 13:32:32 +0100 Subject: [LLVMdev] [Proposal] Speculative execution of function calls In-Reply-To: <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> References: <251BD6D4E6A77E4586B482B33960D2283362071E@HASMSX106.ger.corp.intel.com> <5162DAD9-2BFA-4D6F-8AB5-807330659A1F@cl.cam.ac.uk> Message-ID: On 31 July 2013 11:56, David Chisnall wrote: > The slightly orthogonal question to safety is the cost of execution. For > most intrinsics that represent CPU instructions, executing them > speculatively is cheaper than a conditional jump, but this is not the case > for all (for example, some forms of divide instructions on in-order RISC > processors). For other functions, it's even worse because the cost may be > dependent on the input. Consider as a trivial example the well-loved > recursive Fibonacci function. This is always safe to call speculatively, > because it only touches local variables. It is, however, probably never a > good idea to do so. It's also likely that the cost of a real function call > is far more expensive than the elided jump, although this may not be the > case on GPUs where divergent flow control is more expensive than redundant > execution. Making this decision requires knowledge of both the target > architecture and the complexity of the function, which may be dependent on > its inputs. Even in your examples, some of the functions are only safe to > speculatively execute for some subset of their inputs, and you haven't > proposed a way of determining this. > David, If I got it right, this is a proposal for a framework to annotate speculative-safe functions, not a pass that will identify all cases. So, yes, different back-ends can annotate their safe intrinsics, front-ends can annotate their safe calls, and it'll always be a small subset, as with most of other optimizations. As for letting optimization passes use that info, well, it could in theory be possible to count the number of instructions on the callee, and make sure it has no other calls, side-effects or undefined behaviour, and again, that would have to be very conservative. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Wed Jul 31 06:53:13 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Wed, 31 Jul 2013 08:53:13 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> Message-ID: <51F916C9.9060706@codeaurora.org> On 7/30/2013 11:44 PM, Chris Lattner wrote: > > The canonical form should be that loop invariants are hoisted. The canonical form should not depend on the knowledge as to what is invariant and what isn't. It has more to do with preserving certain "common" properties of a loop, such as header, preheader, latch branch, etc. > Optimizations should not depend on perfect loops. What do you mean by "perfect loops"? I was talking about perfect nests. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From rafael.espindola at gmail.com Wed Jul 31 07:17:53 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Wed, 31 Jul 2013 10:17:53 -0400 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> Message-ID: > #1 can be solved via science. I posted the numbers for -ffunction-section -fdata-section when building clang. The overhead is noticeable: 26% larger object files. The overhead of doing this with just symbols is more than just the flag itself. The assembler (MC) has to avoid resolving relocations, but that should not be as bad as 26%. Each section on ELF 64 costs 64 bytes + the name of the section. But I agree. We should do -ffunction-sections + -fdata-sections first. The format extension (flag section?) then becomes just an optimization. BTW, I do think we need more than just one flag section. If for nothing else we need to make sure we don't break __attribute__((used)). In fact, I think we will need that even for -ffunction-sections. Cheers, Rafael From rafael.espindola at gmail.com Wed Jul 31 07:21:15 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Wed, 31 Jul 2013 10:21:15 -0400 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F881DE.60809@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> <51F881DE.60809@codeaurora.org> Message-ID: >> However, I question several fundamental assumptions: >> 1) We should be very certain that -ffunction-sections is not a viable >> solution as it exists and is well supported in other toolchains and >> environments. > > -ffunction-sections and -fdata-sections would work, but that would require > all third party libraries etc to make sure that they are all compiled with > -ffunction-sections. No, just for them to be GCed. In the same way the proposals for having "safe to move symbols" would need some flags to be present in the .o files. The flags are really just an optimization. > >> 2) We should talk to other ELF producers and coordinate to make sure we >> don't end up creating a twisty maze of extensions here. > > This is not a problem with the general ELF community since the binutils > ld/gold would not atomize sections. There is no reason they could not. Cheers, Rafael From tobias at grosser.es Wed Jul 31 07:50:57 2013 From: tobias at grosser.es (Tobias Grosser) Date: Wed, 31 Jul 2013 07:50:57 -0700 Subject: [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite In-Reply-To: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> References: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> Message-ID: <51F92451.9070503@grosser.es> On 07/30/2013 10:03 AM, Star Tan wrote: > Hi Tobias and all Polly developers, > > I have re-evaluated the Polly compile-time performance using newest > LLVM/Polly source code. You can view the results on > http://188.40.87.11:8000 > . > > Especially, I also evaluated ourr187102 patch file that avoids expensive > failure string operations in normal execution. Specifically, I evaluated > two cases for it: > > Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm > -polly-optimizer=none -mllvm -polly-code-generator=none > http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median > Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly > http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median > > The "Polly-NoCodeGen" case is mainly used to compare the compile-time > performance for the polly-detect pass. As shown in the results, our > patch file could significantly reduce the compile-time overhead for some > benchmarks such as tramp3dv4 > (24.2%), simple_types_constant_folding > (12.6%), > oggenc > (9.1%), > loop_unroll > (7.8%) Very nice! Though I am surprised to also see performance regressions. They are all in very shortly executing kernels, so they may very well be measuring noice. Is this really the case? Also, it may be interesting to compare against the non-polly case to see how much overhead there is still due to our scop detetion. > The "Polly-opt" case is used to compare the whole compile-time > performance of Polly. Since our patch file mainly affects the > Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As > shown in results, it reduces the compile-time overhead of some > benchmarks such as tramp3dv4 > (23.7%), simple_types_constant_folding > (12.9%), > oggenc > (8.3%), > loop_unroll > (7.5%) > > At last, I also evaluated the performance of the ScopBottomUp patch that > changes the up-down scop detection into bottom-up scop detection. > Results can be viewed by: > pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. > LLVMPolly-ScopBottomUp.so) -mllvm -polly-optimizer=none -mllvm > -polly-code-generator=none > http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median > pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. > LLVMPolly-ScopBottomUp.so) -mllvm -polly > http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median > (*Both of these results are based on LLVM r187116, which has included > the r187102 patch file that we discussed above) > > Please notice that this patch file will lead to some errors in > Polly-tests, so the data shown here can not be regards as confident > results. For example, this patch can significantly reduce the > compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop > only > because it regards the nested loop as an invalid scop and skips all > following transformations and optimizations. However, I evaluated it > here to see its potential performance impact. Based on the results > shown on > http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median, > we can see detecting scops bottom-up may further reduce Polly > compile-time by more than 10%. Interesting. For some reason it also regresses huffbench quite a bit. :-( I think here an up-to-date non-polly to polly comparision would come handy to see which benchmarks we still see larger performance regressions. And if the bottom-up scop detection actually helps here. As this is a larger patch, we should really have a need for it before switching to it. Cheers, Tobias From tobias at grosser.es Wed Jul 31 08:10:23 2013 From: tobias at grosser.es (Tobias Grosser) Date: Wed, 31 Jul 2013 08:10:23 -0700 Subject: [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass In-Reply-To: <7df533f1.1fe2.14032eeb095.Coremail.tanmx_star@yeah.net> References: <5ce10763.6ff8.14019233ac7.Coremail.tanmx_star@yeah.net> <51F213DB.10700@grosser.es> <1dd45540.5904.140258d029f.Coremail.tanmx_star@yeah.net> <51F5AC61.2080108@grosser.es> <20130729101810.GO11371MdfPADPa@purples> <51F67E1A.6060400@grosser.es> <20130729161501.GS11371MdfPADPa@purples> <51F697EF.10405@grosser.es> <7df533f1.1fe2.14032eeb095.Coremail.tanmx_star@yeah.net> Message-ID: <51F928DF.1070006@grosser.es> On 07/30/2013 09:13 PM, Star Tan wrote: > Hi Tobias and Sven, > > Thanks for your discussion and suggestion.> > @Tobias and Polly developers: > I have attached a patch file to fix this problem. The key idea is to > keep ISL library always seeing the same single parameter for those > memory accesses that use the same index variables even though they have > different constant base values. Good. [.. Very nice improvements .. ] >>This is obviously a hack. The base is not always a constant. >>You can probably just call use something like, >>isl_pw_aff *BaseValue = visit(AR->getOperand(0)) >>Affine = isl_pw_aff_sum(Affine, BaseValue); > > Currently, I only handle constant base values because I have not found a > good way to handle general base values. isl_pw_aff_add requires two > isl_pw_aff parameters, but Affine is actually of isl_aff type. Perhaps > we could first commit a patch file to handle common cases, then we can > considering submitting another patch file to handle general cases. No, we should get this right from the start (and also add test cases). isl provides a function to create an isl_pw_aff from a isl_aff. You can just use that. >>I think this is the right idea, but probably the wrong place to put it. >>I would put this into SCEVValidator::visitAddRecExpr. This function >>always adds the AddRecExpr itself as a parameter, whenever it is found >>to be parametric. However, what we should do is to create a new ScevExpr >>that starts at zero and is otherwise identical. We then add this as a >>parameter. When doing this, we now also need to keep all the parameters >>that have been found previously in the base expression. > > A lot of Polly functions access ParameterIds and Parameters using existing ScevExpr. If we create new ScevExpr and put them into Parameters and ParameterIds, it may require some extra tricks to handle the mapping from existing ScevExpr to newly created ScevExpr. I think we can consider fixing it later. Mh, I would think similar tricks are needed to perform your parameter remapping. I would prefer we think this through now and then choose a solution that works for us. Tobi From rafael.espindola at gmail.com Wed Jul 31 08:19:47 2013 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Wed, 31 Jul 2013 11:19:47 -0400 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: Message-ID: > This proposal puts this complexity in the backend. The backend will lay out > the outgoing argument slots as required by the ABI, and alloca pointer will > be resolved to point to the appropriate outgoing argument slot. OK, so it makes the backend and verifier a bit more complex, but has a smaller IR extension. That is probably a good tradeoff. Thanks, Rafael From dag at cray.com Wed Jul 31 08:19:52 2013 From: dag at cray.com (dag at cray.com) Date: Wed, 31 Jul 2013 10:19:52 -0500 Subject: [LLVMdev] Maintaining LiveIn Message-ID: I would like to maintain the livein information for physical registers on basic blocks past register allocation, or recreate it if possible. The goal is to be able to run a late pass of DeadMachineInstrElim, which requires valid livein information. The X86 target returns false for requiresRegisterScavenging so passes like BranchFolding don't update the livein information. At that point I gather that DeadMachineInstrElim will break. What are the requirements to return true for requiresRegisterScavenging? ARM does but I don't know if special care has been taken in that target to allow it. Alternatively, are there better ways to update or recreate the livein information in a late pass? I don't want to run a whole dataflow analysis if I don't have to. Thanks! -David From shankare at codeaurora.org Wed Jul 31 08:25:28 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Wed, 31 Jul 2013 10:25:28 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <51F6A1DB.5020909@codeaurora.org> <32658219-59C5-4667-ADAA-A0A941BD0EA2@apple.com> <51F85B52.5000604@codeaurora.org> <51F881DE.60809@codeaurora.org> Message-ID: <51F92C68.7030900@codeaurora.org> > Just because this won't actively explode with other ELF tools doesn't > mean we shouldn't try to reach consensus throughout the larger > community before changing the way in which Clang writes ELF files and > LLD reads them. We really do need to maintain interoperability (by and > large) with other toolchains on the same platform. Even with these changes, all the tools are going to be still interoperable. The only consumer of the new section would be just clang based tools. > > > OK, there are others who disagree though. =] It is at least something that > we shouldn't write off and should consider *IF* we're going to also > consider the rest of the proposal. But currently, I think this entire thing > already works with -ffunction-sections until there is a concrete > description of why that mode simply won't work. I agree to this, but since -ffunction-sections and -fdata-sections has not been the default over all these years, I put up this proposal. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From shankare at codeaurora.org Wed Jul 31 08:32:12 2013 From: shankare at codeaurora.org (Shankar Easwaran) Date: Wed, 31 Jul 2013 10:32:12 -0500 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F8E912.4080600@xmos.com> References: <51F1664F.1040003@codeaurora.org> <51F8E912.4080600@xmos.com> Message-ID: <51F92DFC.2070004@codeaurora.org> Thanks for your very detailed analysis. From other email conversations, it looks like -ffunction-sections and -fdata-sections are doing what is being iterated in the original proposal. On 7/31/2013 5:38 AM, Richard Osborne wrote: > I'd like to see a more precise definition of "safe". For example just > from the above description it is not clear that "safe" disallows one > function falling through into another, but based on the intended use > cases this clearly isn't allowed. > Doesnt this break the model even with ELF, For example if the code would have been compiled with -ffunction-sections, the fall through into another would just happen by chance when the linker merges similiar sections together ? > How is alignment handled? If I have two functions in the same section > with different .align directives will these be respected when the > section is split apart? Is it OK for a loop within a function to have > a .align? Yes alignment is handled, Each atom has a seperate alignment which is derived from the position where the atom was in the section and the alignment of the section itself. > > What about relocations? If calls are implemented with branches taking > pc-relative offsets then the assembler might patch in the branch > offset and not emit a relocation. This clearly prevents functions from > being removed / reordered, so I assume it is a requirement that a safe > section always uses relocations for branches between functions and if > it has a choice of long or short branches it aways conservatively uses > a long branch. This should be made explicit in the description of safe. Yes you are right. > > If you have a symbol at the same address as a function how do you > decide if it should be associated with this function or the end of the > last function? Are you talking about weak symbols here ? > > Is it a requirement that there are no references to symbols defined > inside the function except for the function symbol itself? If so how > does this work when you have debug info (which might have references > to addresses within the function)? > The model needs to read the debug information that corresponds to the function and keep them housed within the atom data structure itself. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation From swlin at post.harvard.edu Wed Jul 31 09:04:16 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Wed, 31 Jul 2013 09:04:16 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <87y58ngggy.fsf@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> Message-ID: > Quite the contrary, knowing that Clang's C++ ABI is completely > incompatible with MS is a maintenance *simplification*. Yes, for example, as explained by Bjarne once, incompatible name mangling schemes for ABIs that are not guaranteed to be 100% binary compatible is a _feature_, not a bug, since it prevents anyone from even beginning to develop a workflow that relies upon linking possibly incompatible binaries. > > Saying that supporting the MS C++ ABI is an uphill battle is an > understatement (and better say MS C++ ABI*S*, because it evolved over > time and it is known that it will change on future releases.) As far as > I'm concerned, I'll never base my decisions on compiler usage on the > advertisement of MS compatibility by Clang++, becase I *know* that for a > very long time (maybe forever) whatever MS C++ libraries that work with > Clang++ is by luck. That's what happens when you try to implement > compatibility with an undocumented, propietary, complex feature. This is my point as well; it just seems odd to spend so much effort to implement and undocumented ABI with no guarantee of support or stability (outside of MSVC major version releases). Furthermore, I feel that, by reinforcing the notion that use of the MSVC++ ABI is required for development on Windows (when no system-provided libraries or ABIs require it), Clang developers are hindering the adoption of Clang on Windows rather than promoting it, since it is unlikely that any party that is not personally invested in Clang development would be willing to depend on a backward engineered implementation of a "required" but undocumented ABI for production use. Furthermore, it means that Clang will always be behind the curve on Windows, since, even if MSVC++ ABI support is fully usable one day, no one will be willing to link it with object files from a new major version of MSVC without some critical mass of users being guinea pigs first and ensuring that there are no major bugaboos. I agree that there is value in supporting implementing full MSVC++ ABI, if it can be done, but it seems like that support can never be 100% complete (or, more to the point, known to be 100% complete) unless Microsoft itself decides to officially support the implementation or completely stabilize and document their C++ ABIs. However, I personally think that it is not only easier, there is more value in implementing a C++ ABI which is compatible with officially supported C and COM MSVC++ ABI subsets and consciously _not_ compatible in others way (in particular, which does not even attempt to link C++ symbols with MSVC++, since that is not required for COM). This is something that can be made to work and guaranteed (barring some seriously misguided behavior from Microsoft) to continue to work stably. Furthermore, by providing and controlling its own Windows C++ ABI: Clang can possibly offer something that MSVC *cannot* (and does not even try to do): a development platform and ecosystem which provides a stable-over-time C++ ABI and consistent cross-module usage of RTTI, STL, C++ memory management, etc. (which is dicey even within MSVC major versions unless the exact same build settings are used, last time I checked.) This would give someone an active reason to switch to Clang more than anything else Clang currently offers on Windows. Stephen From micah.villmow at smachines.com Wed Jul 31 09:22:57 2013 From: micah.villmow at smachines.com (Micah Villmow) Date: Wed, 31 Jul 2013 16:22:57 +0000 Subject: [LLVMdev] MachineBasicBlocks Cloning In-Reply-To: <31038302-43CA-4E8F-B9FB-B510E4BC3CE8@arcor.de> References: <31038302-43CA-4E8F-B9FB-B510E4BC3CE8@arcor.de> Message-ID: <3947CD34E13C4F4AB2D94AD35AE3FE600749867D@smi-exchange1.smi.local> Cloning a basic block should be a utility function that can be called from any machine pass. That being said, it should only be responsible for cloning the basic block and not updating any analysis passes that were run on the basic block. Those should have to be rerun separately. Micah > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Nico > Sent: Wednesday, July 31, 2013 5:16 AM > To: LLVM Developers Mailing List > Subject: [LLVMdev] MachineBasicBlocks Cloning > > Hi, > > For some schedulers like Trace Scheduling it is necessary to clone basic > blocks. > Instinctively I would think the "Machine Instruction Scheduler" would be the > right place to do so. > > Is it possible to clone MachineBasicBlocks in the "Machine Instruction > Scheduler" pass? Any snares? Or is it to much effort to implement it there > and there is a better place for such things? > > Thank you, > Nico > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From clattner at apple.com Wed Jul 31 09:54:20 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 31 Jul 2013 09:54:20 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> Message-ID: <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> This thread is odd to me. It seems that the gist of your guys' argument is that you don't know if we will ever get full support, therefore we don't welcome progress towards that (very useful) goal/feature. If the specific proposal doesn't make make sense from a design standpoint, that's one thing, but saying we shouldn't take it because of licensing issues with MFC or because it is harmful to be partially (but not fully) compatible with MSVC seems just weird to me. -Chris On Jul 31, 2013, at 9:04 AM, Stephen Lin wrote: >> Quite the contrary, knowing that Clang's C++ ABI is completely >> incompatible with MS is a maintenance *simplification*. > > Yes, for example, as explained by Bjarne once, incompatible name > mangling schemes for ABIs that are not guaranteed to be 100% binary > compatible is a _feature_, not a bug, since it prevents anyone from > even beginning to develop a workflow that relies upon linking possibly > incompatible binaries. > >> >> Saying that supporting the MS C++ ABI is an uphill battle is an >> understatement (and better say MS C++ ABI*S*, because it evolved over >> time and it is known that it will change on future releases.) As far as >> I'm concerned, I'll never base my decisions on compiler usage on the >> advertisement of MS compatibility by Clang++, becase I *know* that for a >> very long time (maybe forever) whatever MS C++ libraries that work with >> Clang++ is by luck. That's what happens when you try to implement >> compatibility with an undocumented, propietary, complex feature. > > This is my point as well; it just seems odd to spend so much effort to > implement and undocumented ABI with no guarantee of support or > stability (outside of MSVC major version releases). Furthermore, I > feel that, by reinforcing the notion that use of the MSVC++ ABI is > required for development on Windows (when no system-provided libraries > or ABIs require it), Clang developers are hindering the adoption of > Clang on Windows rather than promoting it, since it is unlikely that > any party that is not personally invested in Clang development would > be willing to depend on a backward engineered implementation of a > "required" but undocumented ABI for production use. Furthermore, it > means that Clang will always be behind the curve on Windows, since, > even if MSVC++ ABI support is fully usable one day, no one will be > willing to link it with object files from a new major version of MSVC > without some critical mass of users being guinea pigs first and > ensuring that there are no major bugaboos. > > I agree that there is value in supporting implementing full MSVC++ > ABI, if it can be done, but it seems like that support can never be > 100% complete (or, more to the point, known to be 100% complete) > unless Microsoft itself decides to officially support the > implementation or completely stabilize and document their C++ ABIs. > However, I personally think that it is not only easier, there is more > value in implementing a C++ ABI which is compatible with officially > supported C and COM MSVC++ ABI subsets and consciously _not_ > compatible in others way (in particular, which does not even attempt > to link C++ symbols with MSVC++, since that is not required for COM). > This is something that can be made to work and guaranteed (barring > some seriously misguided behavior from Microsoft) to continue to work > stably. > > Furthermore, by providing and controlling its own Windows C++ ABI: > Clang can possibly offer something that MSVC *cannot* (and does not > even try to do): a development platform and ecosystem which provides a > stable-over-time C++ ABI and consistent cross-module usage of RTTI, > STL, C++ memory management, etc. (which is dicey even within MSVC > major versions unless the exact same build settings are used, last > time I checked.) This would give someone an active reason to switch to > Clang more than anything else Clang currently offers on Windows. > > Stephen > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From annulen at yandex.ru Wed Jul 31 10:10:32 2013 From: annulen at yandex.ru (Konstantin Tokarev) Date: Wed, 31 Jul 2013 21:10:32 +0400 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F88368.8060707@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F19211.6050403@codeaurora.org> <0E723450-43B9-4E32-BFE6-EEF5468128A2@apple.com> <868799E2-FECF-4CBA-B48A-33080712EACD@apple.com> <51F88368.8060707@codeaurora.org> Message-ID: <149691375290632@web4d.yandex.ru> 31.07.2013, 07:49, "Shankar Easwaran" : > On 7/30/2013 7:50 PM, Eric Christopher wrote: > >>  On Tue, Jul 30, 2013 at 5:36 PM, Nick Kledzik wrote: >>>  On Jul 30, 2013, at 4:28 PM, Eric Christopher wrote: >>>>  On Mon, Jul 29, 2013 at 9:24 AM, Nick Kledzik wrote: >>>>>  On Jul 25, 2013, at 2:10 PM, Rui Ueyama wrote: >>>>>>  Is there any reason -ffunction-sections and -fdata-sections wouldn't work? If it'll work, it may be be better to say "if you want to get a better linker output use these options", rather than defining new ELF section. >>>>>   From my understanding, -ffunction-sections is a good semantic match.  But it introduces a lot of bloat in the .o file which the linker must process. >>>>  Drive by comment here: >>>> >>>>  Other than the overhead of the section header I'm not sure what bloat >>>>  you're talking about here that the linker needs to process? >>>  The internal model of lld is "atom" based.  Each atom is an indivisible run of bytes.  A compiler generated function naturally matches that and should be an atom.  The problem is that a hand written assembly code could look like it has a couple of functions, but there could be implicit dependencies (like falling through to next function). >>  I'll stipulate all of this :) >>>  If an object file has a hundred functions, that means there will be a hundred more sections (one per function).    So, if we used -ffunction-sections to determine that an object file was compiler generated, we still have the problem that an assembly language programmer could have hand written extra sections that look like -ffunction-sections would have produced, but he did something tricky like have one function with two entry symbols.  So, the linker would need to double check all those hundred sections. >>  I'm not talking about using -ffunction-sections to determine if >>  something is compiler generated, just that there's no inherent penalty >>  in using -ffunction-sections in general. Basically there's no benefit >>  (unless you allow a flag per object, etc) that says whether or not >>  something is "compiler generated", you may as well just use a flag to >>  the linker or a section in the output (the latter is a fairly common >>  elf-ism). > > When you consider the complete link line (consisting of multiple > archives and object files), you may not have all of them compiled with > -ffunction-sections and -fdata-sections. Its also a problem that third > party vendors would provide a library which may / may not be compiled > with that flag. You don't have to compile ALL sources with -ffunction-sections and -fdata-sections in order to use --gc-sections in linker. You can freely miz object built with and without one or both of these options in one link. Moreover, you can even get some size savings (very small though) when using --gc-sections without use of -ffunction-sections and -fdata-sections at all. -- Regards, Konstantin From swlin at post.harvard.edu Wed Jul 31 10:13:20 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Wed, 31 Jul 2013 10:13:20 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> Message-ID: Oh, well, I don't actually have any objection to the patch (I'm not sure if Oscar does) or work in this direction. (So apologies for hijacking, it's just I wanted to back up the sentiment that Oscar expressed initially.) I'm honestly just trying to understand why the engineering focus is where it is, and wonders if anyone has put any thought into supporting our own (or possibly someone else's, if it's documented) C++ ABI on Windows, as a either a subgoal or parallel goal of full MSVC C++ ABI implementation. As I said before, something like that wouldn't just be a stop gap, it would provide real benefits on its own. I'll stop hijacking this thread for that, though, sorry. Stephen On Wed, Jul 31, 2013 at 9:54 AM, Chris Lattner wrote: > This thread is odd to me. It seems that the gist of your guys' argument is that you don't know if we will ever get full support, therefore we don't welcome progress towards that (very useful) goal/feature. > > If the specific proposal doesn't make make sense from a design standpoint, that's one thing, but saying we shouldn't take it because of licensing issues with MFC or because it is harmful to be partially (but not fully) compatible with MSVC seems just weird to me. > > -Chris > > On Jul 31, 2013, at 9:04 AM, Stephen Lin wrote: > >>> Quite the contrary, knowing that Clang's C++ ABI is completely >>> incompatible with MS is a maintenance *simplification*. >> >> Yes, for example, as explained by Bjarne once, incompatible name >> mangling schemes for ABIs that are not guaranteed to be 100% binary >> compatible is a _feature_, not a bug, since it prevents anyone from >> even beginning to develop a workflow that relies upon linking possibly >> incompatible binaries. >> >>> >>> Saying that supporting the MS C++ ABI is an uphill battle is an >>> understatement (and better say MS C++ ABI*S*, because it evolved over >>> time and it is known that it will change on future releases.) As far as >>> I'm concerned, I'll never base my decisions on compiler usage on the >>> advertisement of MS compatibility by Clang++, becase I *know* that for a >>> very long time (maybe forever) whatever MS C++ libraries that work with >>> Clang++ is by luck. That's what happens when you try to implement >>> compatibility with an undocumented, propietary, complex feature. >> >> This is my point as well; it just seems odd to spend so much effort to >> implement and undocumented ABI with no guarantee of support or >> stability (outside of MSVC major version releases). Furthermore, I >> feel that, by reinforcing the notion that use of the MSVC++ ABI is >> required for development on Windows (when no system-provided libraries >> or ABIs require it), Clang developers are hindering the adoption of >> Clang on Windows rather than promoting it, since it is unlikely that >> any party that is not personally invested in Clang development would >> be willing to depend on a backward engineered implementation of a >> "required" but undocumented ABI for production use. Furthermore, it >> means that Clang will always be behind the curve on Windows, since, >> even if MSVC++ ABI support is fully usable one day, no one will be >> willing to link it with object files from a new major version of MSVC >> without some critical mass of users being guinea pigs first and >> ensuring that there are no major bugaboos. >> >> I agree that there is value in supporting implementing full MSVC++ >> ABI, if it can be done, but it seems like that support can never be >> 100% complete (or, more to the point, known to be 100% complete) >> unless Microsoft itself decides to officially support the >> implementation or completely stabilize and document their C++ ABIs. >> However, I personally think that it is not only easier, there is more >> value in implementing a C++ ABI which is compatible with officially >> supported C and COM MSVC++ ABI subsets and consciously _not_ >> compatible in others way (in particular, which does not even attempt >> to link C++ symbols with MSVC++, since that is not required for COM). >> This is something that can be made to work and guaranteed (barring >> some seriously misguided behavior from Microsoft) to continue to work >> stably. >> >> Furthermore, by providing and controlling its own Windows C++ ABI: >> Clang can possibly offer something that MSVC *cannot* (and does not >> even try to do): a development platform and ecosystem which provides a >> stable-over-time C++ ABI and consistent cross-module usage of RTTI, >> STL, C++ memory management, etc. (which is dicey even within MSVC >> major versions unless the exact same build settings are used, last >> time I checked.) This would give someone an active reason to switch to >> Clang more than anything else Clang currently offers on Windows. >> >> Stephen >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From clattner at apple.com Wed Jul 31 10:20:22 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 31 Jul 2013 10:20:22 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F916C9.9060706@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F916C9.9060706@codeaurora.org> Message-ID: On Jul 31, 2013, at 6:53 AM, Krzysztof Parzyszek wrote: > On 7/30/2013 11:44 PM, Chris Lattner wrote: >> >> The canonical form should be that loop invariants are hoisted. > > The canonical form should not depend on the knowledge as to what is invariant and what isn't. It has more to do with preserving certain "common" properties of a loop, such as header, preheader, latch branch, etc. Canonicalization of the IR is not about guarantees, it is about what assumptions passes can make and what form they have to tolerate. >> Optimizations should not depend on perfect loops. > > What do you mean by "perfect loops"? I was talking about perfect nests. I'm talking about perfect loop nests, as in the classical fortran loop transformation sense. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Wed Jul 31 10:33:52 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 10:33:52 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> Message-ID: On Jul 31, 2013 10:16 AM, "Stephen Lin" wrote: > > Oh, well, I don't actually have any objection to the patch (I'm not > sure if Oscar does) or work in this direction. (So apologies for > hijacking, it's just I wanted to back up the sentiment that Oscar > expressed initially.) > > I'm honestly just trying to understand why the engineering focus is > where it is, and wonders if anyone has put any thought into supporting > our own (or possibly someone else's, if it's documented) C++ ABI on > Windows, as a either a subgoal or parallel goal of full MSVC C++ ABI > implementation. As I said before, something like that wouldn't just be > a stop gap, it would provide real benefits on its own. Thia patch and the others currently under development seem clearly aimed at compatibility If you or others are interested in something else then mail the patches. I don't think any of this precludes the other. > > I'll stop hijacking this thread for that, though, sorry. If you're around I think several of us will be at the social tomorrow. Always a good place to have such discussions. > > Stephen > > On Wed, Jul 31, 2013 at 9:54 AM, Chris Lattner wrote: > > This thread is odd to me. It seems that the gist of your guys' argument is that you don't know if we will ever get full support, therefore we don't welcome progress towards that (very useful) goal/feature. > > > > If the specific proposal doesn't make make sense from a design standpoint, that's one thing, but saying we shouldn't take it because of licensing issues with MFC or because it is harmful to be partially (but not fully) compatible with MSVC seems just weird to me. > > > > -Chris > > > > On Jul 31, 2013, at 9:04 AM, Stephen Lin wrote: > > > >>> Quite the contrary, knowing that Clang's C++ ABI is completely > >>> incompatible with MS is a maintenance *simplification*. > >> > >> Yes, for example, as explained by Bjarne once, incompatible name > >> mangling schemes for ABIs that are not guaranteed to be 100% binary > >> compatible is a _feature_, not a bug, since it prevents anyone from > >> even beginning to develop a workflow that relies upon linking possibly > >> incompatible binaries. > >> > >>> > >>> Saying that supporting the MS C++ ABI is an uphill battle is an > >>> understatement (and better say MS C++ ABI*S*, because it evolved over > >>> time and it is known that it will change on future releases.) As far as > >>> I'm concerned, I'll never base my decisions on compiler usage on the > >>> advertisement of MS compatibility by Clang++, becase I *know* that for a > >>> very long time (maybe forever) whatever MS C++ libraries that work with > >>> Clang++ is by luck. That's what happens when you try to implement > >>> compatibility with an undocumented, propietary, complex feature. > >> > >> This is my point as well; it just seems odd to spend so much effort to > >> implement and undocumented ABI with no guarantee of support or > >> stability (outside of MSVC major version releases). Furthermore, I > >> feel that, by reinforcing the notion that use of the MSVC++ ABI is > >> required for development on Windows (when no system-provided libraries > >> or ABIs require it), Clang developers are hindering the adoption of > >> Clang on Windows rather than promoting it, since it is unlikely that > >> any party that is not personally invested in Clang development would > >> be willing to depend on a backward engineered implementation of a > >> "required" but undocumented ABI for production use. Furthermore, it > >> means that Clang will always be behind the curve on Windows, since, > >> even if MSVC++ ABI support is fully usable one day, no one will be > >> willing to link it with object files from a new major version of MSVC > >> without some critical mass of users being guinea pigs first and > >> ensuring that there are no major bugaboos. > >> > >> I agree that there is value in supporting implementing full MSVC++ > >> ABI, if it can be done, but it seems like that support can never be > >> 100% complete (or, more to the point, known to be 100% complete) > >> unless Microsoft itself decides to officially support the > >> implementation or completely stabilize and document their C++ ABIs. > >> However, I personally think that it is not only easier, there is more > >> value in implementing a C++ ABI which is compatible with officially > >> supported C and COM MSVC++ ABI subsets and consciously _not_ > >> compatible in others way (in particular, which does not even attempt > >> to link C++ symbols with MSVC++, since that is not required for COM). > >> This is something that can be made to work and guaranteed (barring > >> some seriously misguided behavior from Microsoft) to continue to work > >> stably. > >> > >> Furthermore, by providing and controlling its own Windows C++ ABI: > >> Clang can possibly offer something that MSVC *cannot* (and does not > >> even try to do): a development platform and ecosystem which provides a > >> stable-over-time C++ ABI and consistent cross-module usage of RTTI, > >> STL, C++ memory management, etc. (which is dicey even within MSVC > >> major versions unless the exact same build settings are used, last > >> time I checked.) This would give someone an active reason to switch to > >> Clang more than anything else Clang currently offers on Windows. > >> > >> Stephen > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Wed Jul 31 10:40:42 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Wed, 31 Jul 2013 12:40:42 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F916C9.9060706@codeaurora.org> Message-ID: <51F94C1A.7050304@codeaurora.org> On 7/31/2013 12:20 PM, Chris Lattner wrote: > On Jul 31, 2013, at 6:53 AM, Krzysztof Parzyszek > > wrote: >> On 7/30/2013 11:44 PM, Chris Lattner wrote: >>> >>> The canonical form should be that loop invariants are hoisted. >> >> The canonical form should not depend on the knowledge as to what is >> invariant and what isn't. It has more to do with preserving certain >> "common" properties of a loop, such as header, preheader, latch >> branch, etc. > > Canonicalization of the IR is not about guarantees, it is about what > assumptions passes can make and what form they have to tolerate. Then the transformations that need to assume that invariants were hoisted could be run after LICM. Since loops are defined in terms of regions (back-edges), what makes sense is a canonical form that normalizes the structure into a common form. Whether the invariants are hoisted or not doesn't need to be a part of that. >>> Optimizations should not depend on perfect loops. >> >> What do you mean by "perfect loops"? I was talking about perfect nests. > > I'm talking about perfect loop nests, as in the classical fortran loop > transformation sense. Most nest optimizations only apply to perfect nests. Each such optimization could try to "fix" the nest for its own purposes, but it would be a lot of duplicated effort. Whatever c-n does, it should not be getting in the way. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From shuxin.llvm at gmail.com Wed Jul 31 11:02:56 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 31 Jul 2013 11:02:56 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F94C1A.7050304@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F916C9.9060706@codeaurora.org> <51F94C1A.7050304@codeaurora.org> Message-ID: <51F95150.3060700@gmail.com> >> I'm talking about perfect loop nests, as in the classical fortran loop >> transformation sense. > > Most nest optimizations only apply to perfect nests. Each such > optimization could try to "fix" the nest for its own purposes, but it > would be a lot of duplicated effort. If each L.N.O pass have to fix by itself, I would say this LNO component is pretty lame. IMHO, It is natural to run a preparation pass to shape the loop nests into a shape such that following LNO optimizers feel comfortable to kick in. Such preparation/fix pass should include getting rid of imperfect part, fusion, fission on individual nests or the neighboring nests, at appropriate nest level. In theory, you are always possible to get rid of imperfect part by tugging them into the loop with a condition like "if "iv == 1st iteration", or if possible you can distribute the imperfect part. I don't think the imperfect part created by LICM need such expensive transformation; you just place them back to appropriate loop. In this sense, LICM should be become the culprit for disabling some optimizers. You might argue running LICM early will miss the opportunities created by permutation. True. But, now that permutation has detailed dep-test result, it clearly understand which mem-access is invariant w.r.t which nest loop, why not move them to right place. It is trivial for such optimizers, but it seems to be quite hard for following scalar optimizer, unless it can afford running expensive dependence test again. IMHO, I don't see strong disadvantage run LICM either before or after LNO. From rnk at google.com Wed Jul 31 11:05:53 2013 From: rnk at google.com (Reid Kleckner) Date: Wed, 31 Jul 2013 11:05:53 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> Message-ID: On Wed, Jul 31, 2013 at 10:33 AM, Chandler Carruth wrote: > > On Jul 31, 2013 10:16 AM, "Stephen Lin" wrote: > > > > Oh, well, I don't actually have any objection to the patch (I'm not > > sure if Oscar does) or work in this direction. (So apologies for > > hijacking, it's just I wanted to back up the sentiment that Oscar > > expressed initially.) > > > > I'm honestly just trying to understand why the engineering focus is > > where it is, and wonders if anyone has put any thought into supporting > > our own (or possibly someone else's, if it's documented) C++ ABI on > > Windows, as a either a subgoal or parallel goal of full MSVC C++ ABI > > implementation. As I said before, something like that wouldn't just be > > a stop gap, it would provide real benefits on its own. > > Thia patch and the others currently under development seem clearly aimed > at compatibility > If you or others are interested in something else then mail the patches. I > don't think any of this precludes the other. > Just to clarify, there's no patch for this particular issue yet. This is an RFC so I don't waste time on patches that will be rejected due to high level concerns. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Wed Jul 31 11:12:15 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Wed, 31 Jul 2013 13:12:15 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F95150.3060700@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F916C9.9060706@codeaurora.org> <51F94C1A.7050304@codeaurora.org> <51F95150.3060700@gmail.com> Message-ID: <51F9537F.60703@codeaurora.org> On 7/31/2013 1:02 PM, Shuxin Yang wrote: > > You might argue running LICM early will miss the opportunities created > by permutation. True. > But, now that permutation has detailed dep-test result, it clearly > understand which mem-access > is invariant w.r.t which nest loop, why not move them to right place. Because the next nest optimization would have to deal with it again. As long as we are working on high-level structures, like loop nests, the details (such as invariance of some expression) should not concern us. That would be a job for a subsequent, lower-level pass (e.g. working on a single loop at a time). -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From swlin at post.harvard.edu Wed Jul 31 11:51:08 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Wed, 31 Jul 2013 11:51:08 -0700 Subject: [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> Message-ID: > Thia patch and the others currently under development seem clearly aimed at > compatibility > If you or others are interested in something else then mail the patches. I > don't think any of this precludes the other. Thanks. Unfortunately, it's not something I can prioritize to work, so all I can do at this point is express my opinion and hope someone agrees with me :) And yes, of course, it's not mutually exclusive at all; in fact, it seems like from a technical perspective a lot of the work for a C and COM-complaint ABI (but Itanium-like otherwise, perhaps) would be a subset of the work of doing full compatibility--things would just need to be repackaged and exposed in a different. So it would be great if someone actively working in this area can keep such an idea in mind, since I think it would be valuable for reasons other than as a stop gap. (A stable and sane Windows C++ ABI with quality compiler support would have been a godsend to me in a former life, personally.) Stephen From pranav.garg2107 at gmail.com Wed Jul 31 13:09:44 2013 From: pranav.garg2107 at gmail.com (Pranav Garg) Date: Wed, 31 Jul 2013 15:09:44 -0500 Subject: [LLVMdev] Error building compiler-rt Message-ID: Hi, I am trying to build llvm along with clang and compiler-rt. When I run make, I am getting the following compilation error (I tried compiling llvm-3.2, which is what I need for my project, but also tried llvm-3.3 and the current llvm source from the git repository). ... COMPILE: clang_linux/full-x86_64/x86_64: /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:53:29: error: cast to 'unsigned char *' from smaller integer type 'unsigned int' [-Werror,-Wint-to-pointer-cast] unsigned char* startPage = (unsigned char*)(p & pageAlignMask); ^ /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:54:27: error: cast to 'unsigned char *' from smaller integer type 'unsigned int' [-Werror,-Wint-to-pointer-cast] unsigned char* endPage = (unsigned char*)((p+TRAMPOLINE_SIZE+pageSize) & pageAlignMask); ^ 2 errors generated. ... On gcc --version I get the following output: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 My operating system is a Ubuntu 12.04.1 LTS. On typing uname -a I get: Linux pranav 3.2.0-33-generic-pae #52-Ubuntu SMP Thu Oct 18 16:39:21 UTC 2012 i686 i686 i386 GNU/Linux Any ideas as to how to resolve this compilation error? Thanks Pranav -------------- next part -------------- An HTML attachment was scrubbed... URL: From arsenm2 at gmail.com Wed Jul 31 13:24:19 2013 From: arsenm2 at gmail.com (Matt Arsenault) Date: Wed, 31 Jul 2013 13:24:19 -0700 Subject: [LLVMdev] Error building compiler-rt In-Reply-To: References: Message-ID: <807F0765-493C-4C08-9D32-29C73C3EB7C7@gmail.com> You can disable -Werror by adding the cmake flag -DLLVM_ENABLE_WERROR=OFF, which should let it just ignore that (that's also the default, so you must have turned it on somewhere) On Jul 31, 2013, at 13:09 , Pranav Garg wrote: > Hi, > > I am trying to build llvm along with clang and compiler-rt. When I run make, I am getting the following compilation error (I tried compiling llvm-3.2, which is what I need for my project, but also tried llvm-3.3 and the current llvm source from the git repository). > > ... > COMPILE: clang_linux/full-x86_64/x86_64: /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c > /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:53:29: error: cast to 'unsigned char *' from smaller integer type 'unsigned int' > [-Werror,-Wint-to-pointer-cast] > unsigned char* startPage = (unsigned char*)(p & pageAlignMask); > ^ > /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:54:27: error: cast to 'unsigned char *' from smaller integer type 'unsigned int' > [-Werror,-Wint-to-pointer-cast] > unsigned char* endPage = (unsigned char*)((p+TRAMPOLINE_SIZE+pageSize) & pageAlignMask); > ^ > 2 errors generated. > ... > > On gcc --version I get the following output: > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > My operating system is a Ubuntu 12.04.1 LTS. On typing uname -a I get: > Linux pranav 3.2.0-33-generic-pae #52-Ubuntu SMP Thu Oct 18 16:39:21 UTC 2012 i686 i686 i386 GNU/Linux > > Any ideas as to how to resolve this compilation error? > > Thanks > Pranav > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From jfb at google.com Wed Jul 31 14:32:03 2013 From: jfb at google.com (JF Bastien) Date: Wed, 31 Jul 2013 14:32:03 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` Message-ID: Hi, TL;DR: should we add a new memory ordering to fences? ``fence seq_cst`` is currently use to represent two things: - GCC-style builtin ``__sync_synchronize()`` [0][1]. - C11/C++11's sequentially-consistent thread fence ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. As far as I understand: - The former orders all memory and emits an actual fence instruction. - The latter only provides a total order with other sequentially-consistent loads and stores, which means that it's possible to move non-sequentially-consistent loads and stores around it. The GCC-style builtin effectively does the same as the C11/C++11 sequentially-consistent thread fence, surrounded by compiler barriers (``call void asm sideeffect "", "~{memory}"``). The LLVM language reference [3] describes ``fence seq_cst`` in terms of the C11/C++11 primitive, but it looks like LLVM's codebase treats it like the GCC-style builtin. That's strictly correct, but it seems desirable to represent the GCC-style builtin with a ninth LLVM-internal memory ordering that's stricter than ``llvm::SequentiallyConsistent``. ``fence seq_cst`` could then fully utilize C11/C++11's semantics, without breaking the GCC-style builtin. >From C11/C++11's point of view this other memory ordering isn't useful because the primitives offered are sufficient to express valid and performant code, but I believe that LLVM needs this new memory ordering to accurately represent the GCC-style builtin while fully taking advantage of the C11/C++11 memory model. Am I correct? I don't think it's worth implementing just yet since C11/C++11 are still relatively new, but I'd like to be a bit forward looking for PNaCl's sake. Thanks, JF [0] http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fsync-Builtins.html [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 [2] C++11 Standard section 29.8 - Fences [3] http://llvm.org/docs/LangRef.html#fence-instruction From richard at xmos.com Wed Jul 31 14:35:44 2013 From: richard at xmos.com (Richard Osborne) Date: Wed, 31 Jul 2013 21:35:44 +0000 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <51F92DFC.2070004@codeaurora.org> References: <51F1664F.1040003@codeaurora.org> <51F8E912.4080600@xmos.com> <51F92DFC.2070004@codeaurora.org> Message-ID: <56B86283-A534-46A4-9F43-62673A77457B@xmos.com> On 31 Jul 2013, at 16:32, Shankar Easwaran wrote: > Thanks for your very detailed analysis. From other email conversations, it looks like -ffunction-sections and -fdata-sections are doing what is being iterated in the original proposal. Thanks for your answers :) > > On 7/31/2013 5:38 AM, Richard Osborne wrote: >> I'd like to see a more precise definition of "safe". For example just from the above description it is not clear that "safe" disallows one function falling through into another, but based on the intended use cases this clearly isn't allowed. >> > Doesnt this break the model even with ELF, For example if the code would have been compiled with -ffunction-sections, the fall through into another would just happen by chance when the linker merges similiar sections together? Yes, if one function fell through into another it wouldn't be valid to put them in separate sections either. I was just trying to pin down the meaning of "safe". >> >> If you have a symbol at the same address as a function how do you decide if it should be associated with this function or the end of the last function? > Are you talking about weak symbols here? I was thinking about DWARF which, IIRC uses references to symbols in various places to mark the beginning and end (one past the last byte) of various entities. If multiple entities are placed in the same ELF section then, at least without making some assumptions, you can't tell the difference between a symbol pointing to the end of one entity and a symbol pointing to the start of the next. However if the entities are placed in different ELF sections you can use the symbol's section index to differentiate between the two cases. >> >> Is it a requirement that there are no references to symbols defined inside the function except for the function symbol itself? If so how does this work when you have debug info (which might have references to addresses within the function)? >> > The model needs to read the debug information that corresponds to the function and keep them housed within the atom data structure itself. Is there a write-up about how this works / is going to work? If I go to http://lld.llvm.org/design.html it says: "Currently, the lld model says nothing about debug info. But the most popular debug format is DWARF and there is some impedance mismatch with the lld model and DWARF" Is there any progress on this? Would the same mechanism be used for other cases where you might have references to symbols in the middle of a function (e.g. for jump tables / computed gotos). From pranav.garg2107 at gmail.com Wed Jul 31 14:54:41 2013 From: pranav.garg2107 at gmail.com (Pranav Garg) Date: Wed, 31 Jul 2013 16:54:41 -0500 Subject: [LLVMdev] Error building compiler-rt In-Reply-To: <807F0765-493C-4C08-9D32-29C73C3EB7C7@gmail.com> References: <807F0765-493C-4C08-9D32-29C73C3EB7C7@gmail.com> Message-ID: Hi, I see that ENABLE_WERROR is being set to off (the default value) in the config.log in the llvm build. However on grepping for WERROR in the compiler-rt folder I get the following output: pranav at pranav:~/smack-project/llvm-3.4/src/projects/compiler-rt$ grep -Rin WERROR * lib/asan/tests/CMakeLists.txt:38: -Werror lib/asan/asan_malloc_mac.cc:253:// This function is currently unused, and we build with -Werror. lib/tsan/check_cmake.sh:8:CC=clang CXX=clang++ cmake -DLLVM_ENABLE_WERROR=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON $ROOT/../../../.. lib/tsan/Makefile.old:3:CXXFLAGS = -fPIE -fno-rtti -g -Wall -Werror \ lib/tsan/rtl/Makefile.old:1:CXXFLAGS = -fPIE -g -Wall -Werror -fno-builtin -DTSAN_DEBUG=$(DEBUG) -DSANITIZER_DEBUG=$(DEBUG) lib/tsan/go/buildgo.sh:68:FLAGS=" -I../rtl -I../.. -I../../sanitizer_common -I../../../include -m64 -Wall -Werror -Wno-maybe-uninitialized -fno-exceptions -fno-rtti -DTSAN_GO -DSANITIZER_GO -DTSAN_SHADOW_COUNT=4 $OSCFLAGS" lib/sanitizer_common/sanitizer_platform_limits_posix.cc:410: unsigned IOCTL_FDWERRORCLR = FDWERRORCLR; lib/sanitizer_common/sanitizer_platform_limits_posix.cc:411: unsigned IOCTL_FDWERRORGET = FDWERRORGET; lib/sanitizer_common/sanitizer_platform_limits_posix.h:442: extern unsigned IOCTL_FDWERRORCLR; lib/sanitizer_common/sanitizer_platform_limits_posix.h:443: extern unsigned IOCTL_FDWERRORGET; lib/sanitizer_common/tests/CMakeLists.txt:36: -Wall -Werror -Werror=sign-compare) lib/sanitizer_common/sanitizer_common_interceptors_ioctl.inc:177: _(FDWERRORCLR, NONE, 0); lib/sanitizer_common/sanitizer_common_interceptors_ioctl.inc:178: _(FDWERRORGET, WRITE, struct_floppy_write_errors_sz); make/options.mk:10:CFLAGS := -Wall -Werror make/platform/clang_linux.mk:85:CFLAGS := -Wall -Werror -O3 -fomit-frame-pointer make/platform/clang_darwin.mk:109:CFLAGS := -Wall -Werror -O3 -fomit-frame-pointer make/platform/multi_arch.mk:10:CFLAGS := -Wall -Werror make/platform/darwin_fat.mk:42:CFLAGS := -Wall -Werror I see that -DLLVM_ENABLE_WERROR is being set in the script "lib/tsan/check_cmake.sh". Should I disable -DLLVM_ENABLE_WERROR at this point? Thanks Pranav On Wed, Jul 31, 2013 at 3:24 PM, Matt Arsenault wrote: > You can disable -Werror by adding the cmake flag -DLLVM_ENABLE_WERROR=OFF, > which should let it just ignore that (that's also the default, so you must > have turned it on somewhere) > > On Jul 31, 2013, at 13:09 , Pranav Garg wrote: > > > Hi, > > > > I am trying to build llvm along with clang and compiler-rt. When I run > make, I am getting the following compilation error (I tried compiling > llvm-3.2, which is what I need for my project, but also tried llvm-3.3 and > the current llvm source from the git repository). > > > > ... > > COMPILE: clang_linux/full-x86_64/x86_64: > /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c > > > /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:53:29: > error: cast to 'unsigned char *' from smaller integer type 'unsigned int' > > [-Werror,-Wint-to-pointer-cast] > > unsigned char* startPage = (unsigned char*)(p & pageAlignMask); > > ^ > > > /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:54:27: > error: cast to 'unsigned char *' from smaller integer type 'unsigned int' > > [-Werror,-Wint-to-pointer-cast] > > unsigned char* endPage = (unsigned > char*)((p+TRAMPOLINE_SIZE+pageSize) & pageAlignMask); > > ^ > > 2 errors generated. > > ... > > > > On gcc --version I get the following output: > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > > My operating system is a Ubuntu 12.04.1 LTS. On typing uname -a I get: > > Linux pranav 3.2.0-33-generic-pae #52-Ubuntu SMP Thu Oct 18 > 16:39:21 UTC 2012 i686 i686 i386 GNU/Linux > > > > Any ideas as to how to resolve this compilation error? > > > > Thanks > > Pranav > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brooks at freebsd.org Wed Jul 31 14:59:16 2013 From: brooks at freebsd.org (Brooks Davis) Date: Wed, 31 Jul 2013 16:59:16 -0500 Subject: [LLVMdev] Transitioning build to cmake In-Reply-To: References: <72BC7F3B-DFE5-425E-A185-3D7E1711874D@apple.com> Message-ID: <20130731215916.GQ13659@lor.one-eyed-alien.net> On Wed, Jul 24, 2013 at 12:40:59PM -0700, Sean Silva wrote: > On Wed, Jul 24, 2013 at 10:11 AM, Jeremy Huddleston Sequoia < > jeremyhu at apple.com> wrote: > > 4) Building clang using installed llvm > > > > It looks like there is some support for building clang against an > > installed llvm by setting CLANG_PATH_TO_LLVM_BUILD. This fails miserably > > in part because the installed llvm cmake files reference build time paths, > > but even after fixing that, there are tons of build failures. I'm guessing > > this is still a work in progress, but if I should file bugs, please let me > > know. > > This is probably not a very good idea because clang evolves in lock-step > with LLVM. Unless the installed LLVM is the same revision as the clang you > are building, things are likely to not work due to internal API changes. > The option you cite is more likely intended for when you build clang in a > directory separate from LLVM (rather than when it is in llvm/tools/clang/, > where things just work) but both are still checked out at the same revision. We do the same thing in FreeBSD and keeping the ability to build with the installed llvm is critical given our current ports/package infrastructure. The build with configure based builds is a bit of a hack, but seems to break quite infrequently. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 188 bytes Desc: not available URL: From echristo at gmail.com Wed Jul 31 15:05:16 2013 From: echristo at gmail.com (Eric Christopher) Date: Wed, 31 Jul 2013 15:05:16 -0700 Subject: [LLVMdev] [PROPOSAL] ELF safe/unsafe sections In-Reply-To: <56B86283-A534-46A4-9F43-62673A77457B@xmos.com> References: <51F1664F.1040003@codeaurora.org> <51F8E912.4080600@xmos.com> <51F92DFC.2070004@codeaurora.org> <56B86283-A534-46A4-9F43-62673A77457B@xmos.com> Message-ID: >>> If you have a symbol at the same address as a function how do you decide if it should be associated with this function or the end of the last function? >> Are you talking about weak symbols here? > I was thinking about DWARF which, IIRC uses references to symbols in various places to mark the beginning and end (one past the last byte) of various entities. If multiple entities are placed in the same ELF section then, at least without making some assumptions, you can't tell the difference between a symbol pointing to the end of one entity and a symbol pointing to the start of the next. However if the entities are placed in different ELF sections you can use the symbol's section index to differentiate between the two cases. > Ideally these are going to be local symbols that won't be externally visible so that lld's atomizing code won't see them. And yes, I'd have thought that -ffunction-sections would just ensure trivialness of that atomizing code. Just needs to be done on a per section basis. >>> >>> Is it a requirement that there are no references to symbols defined inside the function except for the function symbol itself? If so how does this work when you have debug info (which might have references to addresses within the function)? >>> >> The model needs to read the debug information that corresponds to the function and keep them housed within the atom data structure itself. > > Is there a write-up about how this works / is going to work? If I go to http://lld.llvm.org/design.html it says: > > "Currently, the lld model says nothing about debug info. But the most popular debug format is DWARF and there is some impedance mismatch with the lld model and DWARF" > > Is there any progress on this? Would the same mechanism be used for other cases where you might have references to symbols in the middle of a function (e.g. for jump tables / computed gotos). > I'm rather curious what people are thinking of for the lld model here, I hadn't realized it was a concern. -eric From ofv at wanadoo.es Wed Jul 31 15:13:08 2013 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Thu, 01 Aug 2013 00:13:08 +0200 Subject: [LLVMdev] MSVC++ ABI compatibility is not a Windows requirement (was: Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI) References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> Message-ID: <87txjagyyz.fsf_-_@wanadoo.es> Chris Lattner writes: > This thread is odd to me. It seems that the gist of your guys' > argument is that you don't know if we will ever get full support, > therefore we don't welcome progress towards that (very useful) > goal/feature. > > If the specific proposal doesn't make make sense from a design > standpoint, that's one thing, but saying we shouldn't take it because > of licensing issues with MFC or because it is harmful to be partially > (but not fully) compatible with MSVC seems just weird to me. Full disclosure: I'm the guy who filled the PR requesting the feature the OP's is working on and, when someone tried to close it as WONTFIX, I strongly opposed. Furthermore, MSVC++ is my Windows compiler and my projects suffer from unwelcome complexity for working around the lack of this feature on LLVM. So I hope it is obvious that I'm not against Reid's proposal, quite the contrary. My point is that nowadays Clang++ is not a production-ready Windows compiler. The missing features are much easier to implement than MS C++ ABI compatibility. For starters, they are well documented. So it perplexes me to see the flux of work on MS C++ compatibility when: * It is an open ended, difficult, uncertain goal. You'll never know for sure that it is done. * Obviously, if you don't implement the features that every Windows C++ compiler must have, you'll never be compatible with MS. Currently Clang++ cannot create or use C++ dlls (a series of patches were submitted but quickly rejected, I hope that Nico was not discouraged by this.) OTOH, implementing 32bit SEH is a must-have requirement for MS compatibility, but it would benefit Clang on Windows independently of the C++ ABI used. * For being the best Windows C++ compiler, Clang++ doesn't need to be compatible with the MS C++ ABI. However (and this is highly detrimental to Clang, IMO) from reading the mailing lists one could easily get the impression that until Clang++ gets the MS C++ ABI, it is not ready for Windows. It wont surprise me to learn that some contributors are working on the MS C++ ABI because they think that way. * Using its current C++ ABI and debug info format on Windows has the advantage of making Clang++ usable together with tools like existing debuggers and profilers. For using Clang with Visual Studio's IDE it must learn how to emit debug info compatible with MS, and that is another can of worms. * As the MS C++ ABI is in flux, while at the same time Clang++ is well ahead of MS on conformance, some weird cases might arise where Clang++ would be forced to extend the ABI on its own at the risk of being incompatible with itself after a while (MS variadic templates implementation required an ABI change. MS engineers admit that planned features might require more changes. Of course, Clang++ already supports most of those features.) * Finally, legal issues remain: supposing that Clang++ gets MS C++ ABI compatibility, is it possible to use MS C++ runtime(s) and libraries with Clang++? We could end with lots of work invested on a feature that only benefits those who are forced to work with third-party C++ libraries distributed on binary form. I guess that such users are a minority. To recap: lots of work is being invested on a feature which is very hard to implement, while at the same time Clang lacks basic features. I'll dare to say that focusing the effort on those features would mean that, in a year at most, Clang would become the best C++ compiler for Windows. Needles to say, everyone has the right to decide on what they work. I fully respect that. But I'll like to know what motivates those who are working so hard on the MS C++ ABI while Clang++ remains as a compiler not ready for production on Windows. From clattner at apple.com Wed Jul 31 15:24:32 2013 From: clattner at apple.com (Chris Lattner) Date: Wed, 31 Jul 2013 15:24:32 -0700 Subject: [LLVMdev] MSVC++ ABI compatibility is not a Windows requirement (was: Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI) In-Reply-To: <87txjagyyz.fsf_-_@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> Message-ID: On Jul 31, 2013, at 3:13 PM, Óscar Fuentes wrote: > Full disclosure: I'm the guy who filled the PR requesting the feature > the OP's is working on and, when someone tried to close it as WONTFIX, I > strongly opposed. Furthermore, MSVC++ is my Windows compiler and my > projects suffer from unwelcome complexity for working around the lack of > this feature on LLVM. Good to know. > My point is that nowadays Clang++ is not a production-ready Windows > compiler. The missing features are much easier to implement than MS C++ > ABI compatibility. For starters, they are well documented. So it > perplexes me to see the flux of work on MS C++ compatibility when Ok, and it is fine for you to share your opinion. However, as an open source project, we generally don't have the luxury of telling contributors how they should spend their time. Suggesting to Chip (or anyone) that there are interesting other problems to solve is one thing, but standing in the way of progress because you think there are higher priorities is another. > * It is an open ended, difficult, uncertain goal. You'll never know for > sure that it is done. Well sure. If you rewind 5 years, people were saying the same thing about GCC compatibility. :-) > * Obviously, if you don't implement the features that every Windows C++ > compiler must have, you'll never be compatible with MS. Currently > Clang++ cannot create or use C++ dlls (a series of patches were > submitted but quickly rejected, I hope that Nico was not discouraged > by this.) OTOH, implementing 32bit SEH is a must-have requirement for > MS compatibility, but it would benefit Clang on Windows independently > of the C++ ABI used. Same comment. Clang still doesn't have openmp or nested function support. It is still useful. > * For being the best Windows C++ compiler, Clang++ doesn't need to be > compatible with the MS C++ ABI. However (and this is highly > detrimental to Clang, IMO) from reading the mailing lists one could > easily get the impression that until Clang++ gets the MS C++ ABI, it > is not ready for Windows. It wont surprise me to learn that some > contributors are working on the MS C++ ABI because they think that > way. Fair point, but being ABI compatible DOES enable certain applications, for example, working with existing binaries that can't be recompiled. > * Using its current C++ ABI and debug info format on Windows has the > advantage of making Clang++ usable together with tools like existing > debuggers and profilers. For using Clang with Visual Studio's IDE it > must learn how to emit debug info compatible with MS, and that is > another can of worms. > * As the MS C++ ABI is in flux, while at the same time Clang++ is well > ahead of MS on conformance, some weird cases might arise where > Clang++ would be forced to extend the ABI on its own at the risk of > being incompatible with itself after a while (MS variadic templates > implementation required an ABI change. MS engineers admit that > planned features might require more changes. Of course, Clang++ > already supports most of those features.) I don't know enough to comment on this, but I don't think that all people interested in using clang on windows necessarily care. > * Finally, legal issues remain: supposing that Clang++ gets MS C++ ABI > compatibility, is it possible to use MS C++ runtime(s) and libraries > with Clang++? We could end with lots of work invested on a feature > that only benefits those who are forced to work with third-party C++ > libraries distributed on binary form. I guess that such users are a > minority. You need the technology first, once technology issues are solved, legal issues can be tackled. It is certainly true that people within Microsoft would love great clang support, perhaps in time the right legal agreements can be hammered out. > To recap: lots of work is being invested on a feature which is very hard > to implement, while at the same time Clang lacks basic features. I'll > dare to say that focusing the effort on those features would mean that, > in a year at most, Clang would become the best C++ compiler for Windows. As I said up front, it is fine for you to share your opinion about what is matter (you're clearly a smart guy and very knowledgeable), please just don't turn this into blocking progress that you personally aren't interested in or don't value. -Chris From chandlerc at google.com Wed Jul 31 15:28:24 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 15:28:24 -0700 Subject: [LLVMdev] MSVC++ ABI compatibility is not a Windows requirement (was: Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI) In-Reply-To: <87txjagyyz.fsf_-_@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> Message-ID: First and foremost: this mailing list is simply not an appropriate plate to discuss legal issues. Full stop. I won't reply or discuss them, and please don't try to do so yourself. Find a lawyer if you want to discuss such things. For the rest of your email, I defer to Chris's comments. If you want to convince us there is a better way, please contribute excellent patches that show that better way. Lead by example. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pidgeot18 at gmail.com Wed Jul 31 15:37:32 2013 From: Pidgeot18 at gmail.com (=?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?=) Date: Wed, 31 Jul 2013 17:37:32 -0500 Subject: [LLVMdev] MSVC++ ABI compatibility is not a Windows requirement In-Reply-To: References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> Message-ID: <51F991AC.6090704@gmail.com> On 7/31/2013 5:24 PM, Chris Lattner wrote: > On Jul 31, 2013, at 3:13 PM, Óscar Fuentes wrote: >> * For being the best Windows C++ compiler, Clang++ doesn't need to be >> compatible with the MS C++ ABI. However (and this is highly >> detrimental to Clang, IMO) from reading the mailing lists one could >> easily get the impression that until Clang++ gets the MS C++ ABI, it >> is not ready for Windows. It wont surprise me to learn that some >> contributors are working on the MS C++ ABI because they think that >> way. > Fair point, but being ABI compatible DOES enable certain applications, for example, working with existing binaries that can't be recompiled. I would like to add that some of the Windows APIs require C++ ABI and are not (easily) usable from C, so some applications are only viable if Clang can be generally compatible with the MS C++ ABI. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist From jyasskin at google.com Wed Jul 31 16:01:32 2013 From: jyasskin at google.com (Jeffrey Yasskin) Date: Wed, 31 Jul 2013 16:01:32 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: 2013/7/31 JF Bastien : > Hi, > > TL;DR: should we add a new memory ordering to fences? > > > ``fence seq_cst`` is currently use to represent two things: > - GCC-style builtin ``__sync_synchronize()`` [0][1]. > - C11/C++11's sequentially-consistent thread fence > ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. > > As far as I understand: > - The former orders all memory and emits an actual fence instruction. > > - The latter only provides a total order with other > sequentially-consistent loads and stores, which means that it's > possible to move non-sequentially-consistent loads and stores around > it. It still acts as an acquire/release fence for any other atomic instruction. For non-atomic instructions, if you have a race, the behavior is undefined anyway, so you can't get a stronger guarantee than what "fence seq_cst" provides. I think "fence seq_cst" is completely equivalent to __sync_synchronize(), but you could convince me otherwise by providing a sample program for which there's a difference. > The GCC-style builtin effectively does the same as the C11/C++11 > sequentially-consistent thread fence, surrounded by compiler barriers > (``call void asm sideeffect "", "~{memory}"``). > > The LLVM language reference [3] describes ``fence seq_cst`` in terms > of the C11/C++11 primitive, but it looks like LLVM's codebase treats > it like the GCC-style builtin. That's strictly correct, but it seems > desirable to represent the GCC-style builtin with a ninth > LLVM-internal memory ordering that's stricter than > ``llvm::SequentiallyConsistent``. ``fence seq_cst`` could then fully > utilize C11/C++11's semantics, without breaking the GCC-style builtin. > From C11/C++11's point of view this other memory ordering isn't useful > because the primitives offered are sufficient to express valid and > performant code, but I believe that LLVM needs this new memory > ordering to accurately represent the GCC-style builtin while fully > taking advantage of the C11/C++11 memory model. > > Am I correct? > > I don't think it's worth implementing just yet since C11/C++11 are > still relatively new, but I'd like to be a bit forward looking for > PNaCl's sake. > > Thanks, > > JF > > > [0] http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fsync-Builtins.html > [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 > [2] C++11 Standard section 29.8 - Fences > [3] http://llvm.org/docs/LangRef.html#fence-instruction > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From jfb at google.com Wed Jul 31 16:15:29 2013 From: jfb at google.com (JF Bastien) Date: Wed, 31 Jul 2013 16:15:29 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: struct { volatile int flag; int value; } s; int get_value_when_ready() { while (s.flag) ; __sync_synchronize(); return s.value; } This is "valid" legacy code on some processors, yet it's not valid to replace __sync_synchronize with an atomic_thread_fence because, in theory, LLVM could hoist the load of s.value. In practice it currently doesn't, but it may in the future if my understanding is correct. My main point is that LLVM needs to support code that was written before C and C++ got a memory model, it doesn't matter that it's undefined behavior and relies on a GCC-style builtin to be "correct". The current standards offer all you need to write new code that can express the above intended behavior, but __sync_synchronize isn't a 1:1 mapping to atomic_thread_fence(seq_cst), it has stronger semantics and that's constraining which optimizations can be done on ``fence seq_cst``. LLVM therefore probably wants to distinguish both, so that it can fully optimize C++11 code without leaving legacy code in a bad position. 2013/7/31 Jeffrey Yasskin : > 2013/7/31 JF Bastien : >> Hi, >> >> TL;DR: should we add a new memory ordering to fences? >> >> >> ``fence seq_cst`` is currently use to represent two things: >> - GCC-style builtin ``__sync_synchronize()`` [0][1]. >> - C11/C++11's sequentially-consistent thread fence >> ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. >> >> As far as I understand: >> - The former orders all memory and emits an actual fence instruction. >> >> - The latter only provides a total order with other >> sequentially-consistent loads and stores, which means that it's >> possible to move non-sequentially-consistent loads and stores around >> it. > > It still acts as an acquire/release fence for any other atomic > instruction. For non-atomic instructions, if you have a race, the > behavior is undefined anyway, so you can't get a stronger guarantee > than what "fence seq_cst" provides. > > I think "fence seq_cst" is completely equivalent to > __sync_synchronize(), but you could convince me otherwise by providing > a sample program for which there's a difference. > >> The GCC-style builtin effectively does the same as the C11/C++11 >> sequentially-consistent thread fence, surrounded by compiler barriers >> (``call void asm sideeffect "", "~{memory}"``). >> >> The LLVM language reference [3] describes ``fence seq_cst`` in terms >> of the C11/C++11 primitive, but it looks like LLVM's codebase treats >> it like the GCC-style builtin. That's strictly correct, but it seems >> desirable to represent the GCC-style builtin with a ninth >> LLVM-internal memory ordering that's stricter than >> ``llvm::SequentiallyConsistent``. ``fence seq_cst`` could then fully >> utilize C11/C++11's semantics, without breaking the GCC-style builtin. >> From C11/C++11's point of view this other memory ordering isn't useful >> because the primitives offered are sufficient to express valid and >> performant code, but I believe that LLVM needs this new memory >> ordering to accurately represent the GCC-style builtin while fully >> taking advantage of the C11/C++11 memory model. >> >> Am I correct? >> >> I don't think it's worth implementing just yet since C11/C++11 are >> still relatively new, but I'd like to be a bit forward looking for >> PNaCl's sake. >> >> Thanks, >> >> JF >> >> >> [0] http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fsync-Builtins.html >> [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 >> [2] C++11 Standard section 29.8 - Fences >> [3] http://llvm.org/docs/LangRef.html#fence-instruction >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From tobias at grosser.es Wed Jul 31 16:30:06 2013 From: tobias at grosser.es (Tobias Grosser) Date: Wed, 31 Jul 2013 16:30:06 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> Message-ID: <51F99DFE.5050500@grosser.es> On 07/30/2013 09:44 PM, Chris Lattner wrote: > > On Jul 30, 2013, at 10:19 AM, Shuxin Yang wrote: > >> The pro for running LICM early is that it may move big redundant stuff out of loop nest. You never know >> how big it is. In case you are lucky , you can move lot of stuff out of >> loop, the loop may become much smaller and hence enable lots of downstream optimizations. This sound >> to be a big win for control-intensive programs where Loop-nest-opt normally is a big, expensive no-op. >> >> The con side is that, as you said, the nest is not perfect any more. However, I would argue LNO optimizations >> should be able to tackle the cases when imperfect part is simple enough (say, no call, no control etc). >> (FYI, Open64's LNO is able to tackle imperfect nesting so long as imperfect part is simple). Or you just reverse >> the LICM, that dosen't sound hard. > > FWIW, I completely agree with this. The canonical form should be that loop invariants are hoisted. Optimizations should not depend on perfect loops. This concept really only makes sense for Source/AST level transformations anyway, which don't apply at the LLVM IR level. Some comments from an LNO such as Polly. In general, Polly and probably many modern loop nest optimizers do not care that much about perfectly or imperfectly nested loop nests. Transformations work either way. LICM is problematic due to another reason. LICM introduces new memory dependences. Here a simple example Normal loop: for i for j sum[i] += A[i][j] LICM loop: for i s = sum[i] for j s += A[i][j] sum[i] = s Calculating precise dependences for the second loop yields a lot more dependences that prevent possible transformations. A LNO can always remove those LICM introduced dependences by expanding memory, but full memory expansion is impractical. Deriving the right amount of memory expansion (e.g. the one that just reverts the LICM) is a difficult problem. From a LNO perspective first deriving possible transformations, then transforming the loop and as a last step applying LICM seems to be the better option. Having said that, if there are compelling reasons outside of LNO to keep the LICM in the canonicalization pass, I can see us following Andrews suggestion to disable LICM in case a LNO is run and having the LNO schedule an additional set of cleanup passes later on. Cheers, Tobias From shuxin.llvm at gmail.com Wed Jul 31 16:47:21 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 31 Jul 2013 16:47:21 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F99DFE.5050500@grosser.es> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F99DFE.5050500@grosser.es> Message-ID: <51F9A209.6040404@gmail.com> On 7/31/13 4:30 PM, Tobias Grosser wrote: > On 07/30/2013 09:44 PM, Chris Lattner wrote: >> >> On Jul 30, 2013, at 10:19 AM, Shuxin Yang wrote: >> >>> The pro for running LICM early is that it may move big redundant >>> stuff out of loop nest. You never know >>> how big it is. In case you are lucky , you can move lot of stuff >>> out of >>> loop, the loop may become much smaller and hence enable lots of >>> downstream optimizations. This sound >>> to be a big win for control-intensive programs where Loop-nest-opt >>> normally is a big, expensive no-op. >>> >>> The con side is that, as you said, the nest is not perfect any >>> more. However, I would argue LNO optimizations >>> should be able to tackle the cases when imperfect part is simple >>> enough (say, no call, no control etc). >>> (FYI, Open64's LNO is able to tackle imperfect nesting so long as >>> imperfect part is simple). Or you just reverse >>> the LICM, that dosen't sound hard. >> >> FWIW, I completely agree with this. The canonical form should be >> that loop invariants are hoisted. Optimizations should not depend on >> perfect loops. This concept really only makes sense for Source/AST >> level transformations anyway, which don't apply at the LLVM IR level. > > Some comments from an LNO such as Polly. In general, Polly and > probably many modern loop nest optimizers do not care that much about > perfectly or imperfectly nested loop nests. Transformations work > either way. > > LICM is problematic due to another reason. LICM introduces new memory > dependences. Here a simple example I'm pretty sure Open64's LNO is able to revert LICM-ed loop back to what to was. > > Normal loop: > > for i > for j > sum[i] += A[i][j] > > LICM loop: > > for i > s = sum[i] > for j > s += A[i][j] > sum[i] = s > > > Calculating precise dependences for the second loop yields a lot more > dependences that prevent possible transformations. A LNO can always > remove those LICM introduced dependences by expanding memory, but full > memory expansion is impractical. Deriving the right amount of memory > expansion (e.g. the one that just reverts the LICM) is a difficult > problem. From a LNO perspective first deriving possible > transformations, then transforming the loop and as a last step > applying LICM seems to be the better option. > > Having said that, if there are compelling reasons outside of LNO to > keep the LICM in the canonicalization pass, I can see us following > Andrews suggestion to disable LICM in case a LNO is run and having the > LNO schedule an additional set of cleanup passes later on. > > Cheers, > Tobias > > > From shuxin.llvm at gmail.com Wed Jul 31 16:55:01 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 31 Jul 2013 16:55:01 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F9A209.6040404@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F99DFE.5050500@grosser.es> <51F9A209.6040404@gmail.com> Message-ID: <51F9A3D5.1070609@gmail.com> On 7/31/13 4:47 PM, Shuxin Yang wrote: > > On 7/31/13 4:30 PM, Tobias Grosser wrote: >> On 07/30/2013 09:44 PM, Chris Lattner wrote: >>> >>> On Jul 30, 2013, at 10:19 AM, Shuxin Yang >>> wrote: >>> >>>> The pro for running LICM early is that it may move big redundant >>>> stuff out of loop nest. You never know >>>> how big it is. In case you are lucky , you can move lot of stuff >>>> out of >>>> loop, the loop may become much smaller and hence enable lots of >>>> downstream optimizations. This sound >>>> to be a big win for control-intensive programs where Loop-nest-opt >>>> normally is a big, expensive no-op. >>>> >>>> The con side is that, as you said, the nest is not perfect any >>>> more. However, I would argue LNO optimizations >>>> should be able to tackle the cases when imperfect part is simple >>>> enough (say, no call, no control etc). >>>> (FYI, Open64's LNO is able to tackle imperfect nesting so long as >>>> imperfect part is simple). Or you just reverse >>>> the LICM, that dosen't sound hard. >>> >>> FWIW, I completely agree with this. The canonical form should be >>> that loop invariants are hoisted. Optimizations should not depend >>> on perfect loops. This concept really only makes sense for >>> Source/AST level transformations anyway, which don't apply at the >>> LLVM IR level. >> >> Some comments from an LNO such as Polly. In general, Polly and >> probably many modern loop nest optimizers do not care that much about >> perfectly or imperfectly nested loop nests. Transformations work >> either way. >> >> LICM is problematic due to another reason. LICM introduces new memory >> dependences. Here a simple example > > I'm pretty sure Open64's LNO is able to revert LICM-ed loop back to > what to was. Recall little bit more, it must be done in forwarding pass. > > >> >> Normal loop: >> >> for i >> for j >> sum[i] += A[i][j] >> >> LICM loop: >> >> for i >> s = sum[i] >> for j >> s += A[i][j] >> sum[i] = s >> >> >> Calculating precise dependences for the second loop yields a lot more >> dependences that prevent possible transformations. A LNO can always >> remove those LICM introduced dependences by expanding memory, but >> full memory expansion is impractical. Deriving the right amount of >> memory expansion (e.g. the one that just reverts the LICM) is a >> difficult problem. From a LNO perspective first deriving possible >> transformations, then transforming the loop and as a last step >> applying LICM seems to be the better option. >> >> Having said that, if there are compelling reasons outside of LNO to >> keep the LICM in the canonicalization pass, I can see us following >> Andrews suggestion to disable LICM in case a LNO is run and having >> the LNO schedule an additional set of cleanup passes later on. >> >> Cheers, >> Tobias >> >> >> > From ripzonetriton at gmail.com Wed Jul 31 17:21:05 2013 From: ripzonetriton at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Matos?=) Date: Thu, 1 Aug 2013 01:21:05 +0100 Subject: [LLVMdev] [cfe-dev] MSVC++ ABI compatibility is not a Windows requirement In-Reply-To: <87d2pygtpn.fsf@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> <87d2pygtpn.fsf@wanadoo.es> Message-ID: On Thu, Aug 1, 2013 at 1:06 AM, Óscar Fuentes wrote: > * To understand why they chose to work on MS C++ ABI compatibility > instead of other most basic missing features that preclude using > Clang for serious C++ development. I'll be grateful if anyone > involved on the MS C++ ABI comments on this (it was already mentioned > that Wine is interested on that feature, but I don't understand why.) > Clang is also being increasingly used by tools and having proper MSVC C++ ABI support is very valuable for extracting all kinds of information for interop purposes (record layouts, vftable layouts, mangling, etc). > * To ensure that everybody knows that supporting the MS C++ ABI is not > required at all for being a Windows compiler. It is if you need to use closed source libraries from vendors that only publish MSVC builds. -- João Matos -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Wed Jul 31 17:37:36 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 17:37:36 -0700 Subject: [LLVMdev] [cfe-dev] MSVC++ ABI compatibility is not a Windows requirement In-Reply-To: <87d2pygtpn.fsf@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> <87d2pygtpn.fsf@wanadoo.es> Message-ID: On Wed, Jul 31, 2013 at 5:06 PM, Óscar Fuentes wrote: > prompts me to ask if it is right to submit to the mailing lists patches > implementing a feature that is known to be legally "controversial". > I have no idea what you are talking about, but if you have legal or other questions, or want to discuss 'legal controversies', this isn't the right forum. Please do not do it again here. We do not discuss legal issues on this mailing list. This isn't about you, or this subject, or anything. It's simply not an appropriate place *for anyone* to ask about or formulate opinions about legal topics. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chandlerc at google.com Wed Jul 31 17:36:45 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 17:36:45 -0700 Subject: [LLVMdev] [cfe-dev] MSVC++ ABI compatibility is not a Windows requirement In-Reply-To: <87d2pygtpn.fsf@wanadoo.es> References: <51F6676E.2090800@free.fr> <51F670DF.5080804@free.fr> <878v0phswm.fsf@wanadoo.es> <87y58ngggy.fsf@wanadoo.es> <72DFC3DF-FA19-4885-AFCB-64FC229446FF@apple.com> <87txjagyyz.fsf_-_@wanadoo.es> <87d2pygtpn.fsf@wanadoo.es> Message-ID: On Wed, Jul 31, 2013 at 5:06 PM, Óscar Fuentes wrote: > prompts me to ask if it is right to submit to the mailing lists patches > implementing a feature that is known to be legally "controversial". > I have no idea what you are talking about, but if you have legal or other questions, or want to discuss 'legal controversies', this isn't the right forum. Please do not do it again here. We do not discuss legal issues on this mailing list. This isn't about you, or this subject, or anything. It's simply not an appropriate place *for anyone* to ask about or formulate opinions about legal topics. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jyasskin at google.com Wed Jul 31 17:49:33 2013 From: jyasskin at google.com (Jeffrey Yasskin) Date: Wed, 31 Jul 2013 17:49:33 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: Ok, so the semantics of your fence would be that it's a volatile memory access (http://llvm.org/docs/LangRef.html#volatile-memory-accesses), and that it provides happens-before edges for volatile accesses in the same way that a seq_cst fence provides for atomic accesses. FWIW, I don't think we should add that, because it's an attempt to define behavior that's undefined for other reasons (the data race on the volatile). If you (PNaCl?) explicitly want to define the behavior of legacy code that used 'volatile' for synchronization (which has always been undefined behavior outside of assembly, even before the memory model; it just happened to work in many cases), could you compile volatile accesses to 'atomic volatile monotonic' accesses? Then the normal memory model would apply, and I don't think the instructions emitted would change at all on the platforms I'm familiar with. Jeffrey On Wed, Jul 31, 2013 at 4:15 PM, JF Bastien wrote: > struct { > volatile int flag; > int value; > } s; > > int get_value_when_ready() { > while (s.flag) ; > __sync_synchronize(); > return s.value; > } > > This is "valid" legacy code on some processors, yet it's not valid to > replace __sync_synchronize with an atomic_thread_fence because, in > theory, LLVM could hoist the load of s.value. In practice it currently > doesn't, but it may in the future if my understanding is correct. > > My main point is that LLVM needs to support code that was written > before C and C++ got a memory model, it doesn't matter that it's > undefined behavior and relies on a GCC-style builtin to be "correct". > The current standards offer all you need to write new code that can > express the above intended behavior, but __sync_synchronize isn't a > 1:1 mapping to atomic_thread_fence(seq_cst), it has stronger semantics > and that's constraining which optimizations can be done on ``fence > seq_cst``. LLVM therefore probably wants to distinguish both, so that > it can fully optimize C++11 code without leaving legacy code in a bad > position. > > 2013/7/31 Jeffrey Yasskin : >> 2013/7/31 JF Bastien : >>> Hi, >>> >>> TL;DR: should we add a new memory ordering to fences? >>> >>> >>> ``fence seq_cst`` is currently use to represent two things: >>> - GCC-style builtin ``__sync_synchronize()`` [0][1]. >>> - C11/C++11's sequentially-consistent thread fence >>> ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. >>> >>> As far as I understand: >>> - The former orders all memory and emits an actual fence instruction. >>> >>> - The latter only provides a total order with other >>> sequentially-consistent loads and stores, which means that it's >>> possible to move non-sequentially-consistent loads and stores around >>> it. >> >> It still acts as an acquire/release fence for any other atomic >> instruction. For non-atomic instructions, if you have a race, the >> behavior is undefined anyway, so you can't get a stronger guarantee >> than what "fence seq_cst" provides. >> >> I think "fence seq_cst" is completely equivalent to >> __sync_synchronize(), but you could convince me otherwise by providing >> a sample program for which there's a difference. >> >>> The GCC-style builtin effectively does the same as the C11/C++11 >>> sequentially-consistent thread fence, surrounded by compiler barriers >>> (``call void asm sideeffect "", "~{memory}"``). >>> >>> The LLVM language reference [3] describes ``fence seq_cst`` in terms >>> of the C11/C++11 primitive, but it looks like LLVM's codebase treats >>> it like the GCC-style builtin. That's strictly correct, but it seems >>> desirable to represent the GCC-style builtin with a ninth >>> LLVM-internal memory ordering that's stricter than >>> ``llvm::SequentiallyConsistent``. ``fence seq_cst`` could then fully >>> utilize C11/C++11's semantics, without breaking the GCC-style builtin. >>> From C11/C++11's point of view this other memory ordering isn't useful >>> because the primitives offered are sufficient to express valid and >>> performant code, but I believe that LLVM needs this new memory >>> ordering to accurately represent the GCC-style builtin while fully >>> taking advantage of the C11/C++11 memory model. >>> >>> Am I correct? >>> >>> I don't think it's worth implementing just yet since C11/C++11 are >>> still relatively new, but I'd like to be a bit forward looking for >>> PNaCl's sake. >>> >>> Thanks, >>> >>> JF >>> >>> >>> [0] http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fsync-Builtins.html >>> [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 >>> [2] C++11 Standard section 29.8 - Fences >>> [3] http://llvm.org/docs/LangRef.html#fence-instruction >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From jfb at google.com Wed Jul 31 18:10:19 2013 From: jfb at google.com (JF Bastien) Date: Wed, 31 Jul 2013 18:10:19 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: > FWIW, I don't think we should add that, because it's an attempt to > define behavior that's undefined for other reasons (the data race on > the volatile). I had a discussion with Chandler and others, and something I misunderstood was pointed out: it is not an explicit goal of LLVM to support or continue supporting legacy code that did what it had to to express functional concurrent code. It may happen to work now, but unless enough LLVM users express interest this may break one day, and __sync_synchronize may not order anything (it may just emit a fence without forcing any ordering). It was pointed out that it's not clear that __sync_synchronize has a clear spec, and that implementing it properly in LLVM may not be tractable or worthwhile. > If you (PNaCl?) explicitly want to define the behavior of legacy code > that used 'volatile' for synchronization (which has always been > undefined behavior outside of assembly, even before the memory model; > it just happened to work in many cases), could you compile volatile > accesses to 'atomic volatile monotonic' accesses? Then the normal > memory model would apply, and I don't think the instructions emitted > would change at all on the platforms I'm familiar with. I actually go further for now and promote volatiles to seq_cst atomics. This promotion happens after opt, but before most architecture-specific optimizations. I could have used relaxed ordering, but as a conservative first approach went with seq_cst. For PNaCl it's correct because we only support 8/16/32/64 bit types, require natural alignment (though we should provide better diagnostics), packed volatiles can be split, we don't allow direct device access, and we don't allow mapping the same physical address at multiple virtual addresses. We could relax this at a later time, but we'd really be better off if C++11's better-defined memory model were used in the code. This thread (and the side discussion) answered my concern, though with a solution I didn't expect: trying to make user code that doesn't use volatile or atomic, but does use __sync_synchronize, work as intended isn't one of LLVM's goals. From chandlerc at google.com Wed Jul 31 18:16:55 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 18:16:55 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: On Wed, Jul 31, 2013 at 6:10 PM, JF Bastien wrote: > This promotion happens after opt, but before most > architecture-specific optimizations > You will need to do this in the frontend. The target independent optimizers are allowed to use the memory model. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfb at google.com Wed Jul 31 18:39:07 2013 From: jfb at google.com (JF Bastien) Date: Wed, 31 Jul 2013 18:39:07 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: > You will need to do this in the frontend. The target independent optimizers are allowed to use the memory model. We discussed doing this, and concluded that doing it pre-opt was overly restrictive on correct code. Doing it post-opt bakes the behavior into the portable code, so in a way it'll be reliably broken but won't penalize good code. FWIW it's easy to change from one to the other: move one line of code. I hope my explanation makes sense, and it doesn't look like I'm dismissing your comment on implementation issues. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kparzysz at codeaurora.org Wed Jul 31 18:39:26 2013 From: kparzysz at codeaurora.org (Krzysztof Parzyszek) Date: Wed, 31 Jul 2013 20:39:26 -0500 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F99DFE.5050500@grosser.es> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F99DFE.5050500@grosser.es> Message-ID: <51F9BC4E.4070805@codeaurora.org> On 7/31/2013 6:30 PM, Tobias Grosser wrote: > > I can see us following Andrews > suggestion to disable LICM in case a LNO is run and having the LNO > schedule an additional set of cleanup passes later on. The way I was thinking about it is that LICM could be optionally added to the preparation steps, something like requiring loop-closed SSA form for certain transformations. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From chandlerc at google.com Wed Jul 31 18:46:05 2013 From: chandlerc at google.com (Chandler Carruth) Date: Wed, 31 Jul 2013 18:46:05 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: On Wed, Jul 31, 2013 at 6:39 PM, JF Bastien wrote: > > You will need to do this in the frontend. The target independent > optimizers are allowed to use the memory model. > > We discussed doing this, and concluded that doing it pre-opt was overly > restrictive on correct code. Doing it post-opt bakes the behavior into the > portable code, so in a way it'll be reliably broken but won't penalize good > code. > > FWIW it's easy to change from one to the other: move one line of code. I > hope my explanation makes sense, and it doesn't look like I'm dismissing > your comment on implementation issues. > It doesn't really make sense to me. The most likely way for the optimizer to break any of this is in the middle end. By only fixing it afterward, I don't see what the advantage of fixing it at all is... As Jeffrey pointed out, the penalty is relatively low on x86. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shuxin.llvm at gmail.com Wed Jul 31 19:22:37 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 31 Jul 2013 19:22:37 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F9BC4E.4070805@codeaurora.org> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F99DFE.5050500@grosser.es> <51F9BC4E.4070805@codeaurora.org> Message-ID: <51F9C66D.6090007@gmail.com> Now I see a strong value for running LICM earlier -- to brutally expose LNO's stupidity:-) On 7/31/13 6:39 PM, Krzysztof Parzyszek wrote: > On 7/31/2013 6:30 PM, Tobias Grosser wrote: >> >> I can see us following Andrews >> suggestion to disable LICM in case a LNO is run and having the LNO >> schedule an additional set of cleanup passes later on. > > The way I was thinking about it is that LICM could be optionally added > to the preparation steps, something like requiring loop-closed SSA > form for certain transformations. > > -K > From shuxin.llvm at gmail.com Wed Jul 31 19:24:10 2013 From: shuxin.llvm at gmail.com (Shuxin Yang) Date: Wed, 31 Jul 2013 19:24:10 -0700 Subject: [LLVMdev] IR Passes and TargetTransformInfo: Straw Man In-Reply-To: <51F9A3D5.1070609@gmail.com> References: <85172323-EBBD-48E2-B719-74D46BB89030@apple.com> <51F692C1.2080901@codeaurora.org> <35E073FE-A0E4-4DE9-808C-050EEEDE00CE@apple.com> <51F7CF3C.9060603@codeaurora.org> <51F7F587.30304@gmail.com> <6237388B-B65C-4D71-8A97-C0BB6CB53018@apple.com> <51F99DFE.5050500@grosser.es> <51F9A209.6040404@gmail.com> <51F9A3D5.1070609@gmail.com> Message-ID: <51F9C6CA.8070806@gmail.com> Return from work, now I access O64's src, excerpt bellow, FYI: cat -n be/lno/forward.h 94 *** In backward substitution, we look for statements of the form 95 *** "array = scalar", and backward substitute the scalar references 96 *** if some of them occur in a deeper loop and the array reference 97 *** is not reassigned over the life of the scalar's equivalence class. 98 *** This lets us transform the following: 99 *** 100 *** subroutine fs(a, b, c, n) 101 *** integer i, j, k, n 102 *** real s, a(n,n), b(n,n), c(n,n) 103 *** do i = 1, n 104 *** do j = 1, n 105 *** s = c(i,j) 106 *** do k = 1, n 107 *** s = s + a(i,k) * b(k,j) 108 *** end do 109 *** c(i,j) = s 110 *** end do 111 *** end do 112 *** end 113 *** 114 *** into: 115 *** 116 *** subroutine fs(a, b, c, n) 117 *** integer i, j, k, n 118 *** real s, a(n,n), b(n,n), c(n,n) 119 *** do i = 1, n 120 *** do j = 1, n 121 *** do k = 1, n 122 *** c(i,j) = c(i,j) + a(i,k) * b(k,j) 123 *** end do 124 *** end do 125 *** end do 126 *** end >> >>> >>> Normal loop: >>> >>> for i >>> for j >>> sum[i] += A[i][j] >>> >>> LICM loop: >>> >>> for i >>> s = sum[i] >>> for j >>> s += A[i][j] >>> sum[i] = s >>> >>> From tanmx_star at yeah.net Wed Jul 31 19:28:31 2013 From: tanmx_star at yeah.net (Star Tan) Date: Thu, 1 Aug 2013 10:28:31 +0800 (CST) Subject: [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite In-Reply-To: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> References: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> Message-ID: <1dc5f65d.12f3.14037b48368.Coremail.tanmx_star@yeah.net> Hi all, I have also evaluated Poly compile-time performance with our patch file for polly-dependence pass. Results can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/23?baseline=18&compare_to=18 With this patch file, Polly would only create a single parameter for memory accesses that share the same loop variable with different base address value. As a result, it can significantly reduce compile-time for some array-intensive benchmarks such like lu (reduced by 83.65%) and AMGMK (reduced by 56.24%). For our standard benchmark a shown in http://llvm.org/bugs/show_bug.cgi?id=14240, the total compile-time is reduced to 0.0164s from 154.5389s. Especially, the compile-time of polly-dependence is reduced to 0.0066s (40.5%) from 148.8800s ( 96.3%). Cheers, Star Tan At 2013-07-31 01:03:11,"Star Tan" wrote: Hi Tobias and all Polly developers, I have re-evaluated the Polly compile-time performance using newest LLVM/Polly source code. You can view the results on http://188.40.87.11:8000. Especially, I also evaluated our r187102 patch file that avoids expensive failure string operations in normal execution. Specifically, I evaluated two cases for it: Polly-NoCodeGe! n: clang -O3 -load LLVMPolly.so -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median The "Polly-NoCodeGen" case is mainly used to compare the compile-time performance for the polly-detect pass. As shown in the results, our patch file could significantly reduce the compile-time overhead for some benchmarks such as tramp3dv4 (24.2%), simple_types_constant_folding(12.6%), oggenc(9.1%), loop_unroll(7.8%) The "Polly-opt" case is used to compare the whole compile-time performance of Polly. Since our patch file mainly affects the Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As shown in results, it reduces the compile-time overhead of some benchmarks such as tramp3dv4 (23.7%), simple_types_constant_folding(12.9%), oggenc(8.3%), loop_unroll(7.5%) At last, I also evaluated the performance of the ScopBottomUp patch that changes the up-down scop detection into bottom-up scop detection. Results can be viewed by: pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&base line=16&aggregation_fn=median pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median (*Both of these results are based on LLVM r187116, which has included the r187102 patch file that we discussed above) Please notice that this patch file will lead to some errors in Polly-tests, so the data shown here can not be regards as confident results. For example, this patch can significantly reduce the compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop only because it regards the nested loop as an invalid scop and skips all following transformations and optimizations. However, I evaluated it here to see its potential performance impact. Based on the results shown on http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median, we can see detecting scops bottom-up may further reduce Polly compile-time by more than 10%. Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From navy.xliu at gmail.com Wed Jul 31 20:51:41 2013 From: navy.xliu at gmail.com (Liu Xin) Date: Thu, 1 Aug 2013 11:51:41 +0800 Subject: [LLVMdev] can i avoid saving CSRs for functions with noreturn Message-ID: hi, list, i am making a llvm compiler for shader-like programs. as we known, shader programs are short and have less function calls. i found that i have to save/restore callee-saved register(CSR) in prolog and epilog. because I can violate ABI from driver(c code) and shader, i plan to append the attribute 'noreturn' to all shader functions. in PrologEpilogInserter.cpp, you can find that it actually honor an attribute 'naked' which avoid saving CSR. however, it also skips generating stack-pointer adjustment, which i need. my patch is as follows. i am targeting RISC processor. can anyone tell me this patch is generic ? diff --git a/lib/CodeGen/PrologEpilogInserter.cpp b/lib/CodeGen/PrologEpilogInserter.cpp index c791ffb..f19b47a 100644 --- a/lib/CodeGen/PrologEpilogInserter.cpp +++ b/lib/CodeGen/PrologEpilogInserter.cpp @@ -96,7 +96,7 @@ bool PEI::runOnMachineFunction(MachineFunction &Fn) { placeCSRSpillsAndRestores(Fn); // Add the code to save and restore the callee saved registers - if (!F->hasFnAttr(Attribute::Naked)) + if (!F->hasFnAttr(Attribute::Naked) && !F->hasFnAttr(Attribute::NoReturn)) insertCSRSpillsAndRestores(Fn); // Allow the target machine to make final modifications to the function thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmx_star at yeah.net Wed Jul 31 21:23:21 2013 From: tanmx_star at yeah.net (Star Tan) Date: Thu, 1 Aug 2013 12:23:21 +0800 (CST) Subject: [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite In-Reply-To: <51F92451.9070503@grosser.es> References: <6a634f6b.e.14030889341.Coremail.tanmx_star@yeah.net> <51F92451.9070503@grosser.es> Message-ID: <7da754ed.1f75.140381da371.Coremail.tanmx_star@yeah.net> At 2013-07-31 22:50:57,"Tobias Grosser" wrote: >On 07/30/2013 10:03 AM, Star Tan wrote: >> Hi Tobias and all Polly developers, >> >> I have re-evaluated the Polly compile-time performance using newest >> LLVM/Polly source code. You can view the results on >> http://188.40.87.11:8000 >> . >> >> Especially, I also evaluated ourr187102 patch file that avoids expensive >> failure string operations in normal execution. Specifically, I evaluated >> two cases for it: >> >> Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm >> -polly-optimizer=none -mllvm -polly-code-generator=none >> http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median >> Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly >> http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median >> >> The "Polly-NoCodeGen" case is mainly used to compare the compile-time >> performance for the polly-detect pass. As shown in the results, our >> patch file could significantly reduce the compile-time overhead for some >> benchmarks such as tramp3dv4 >> (24.2%), simple_types_constant_folding >> (12.6%), >> oggenc >> (9.1%), >> loop_unroll >> (7.8%) > >Very nice! > >Though I am surprised to also see performance regressions. They are all >in very shortly executing kernels, so they may very well be measuring >noice. Is this really the case? Yes, it seems that shortly executing benchmarks always show huge unexpected noise even we run 10 samples for a test. I have changed the ignore_small abs value to 0.05 from the original 0.01, which means benchmarks with the performance delta less then 0.05s would be skipped. In that case, the results seem to be much more stable. However, I have noticed that there are many other Polly patches between the two version r185399 and r187116. They may also affect the compile-time performance. I would re-evaluate LLVM-testsuite to see the performance improvements caused only by our > >Also, it may be interesting to compare against the non-polly case to see >how much overhead there is still due to our scop detetion. > >> The "Polly-opt" case is used to compare the whole compile-time >> performance of Polly. Since our patch file mainly affects the >> Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As >> shown in results, it reduces the compile-time overhead of some >> benchmarks such as tramp3dv4 >> (23.7%), simple_types_constant_folding >> (12.9%), >> oggenc >> (8.3%), >> loop_unroll >> (7.5%) >> >> At last, I also evaluated the performance of the ScopBottomUp patch that >> changes the up-down scop detection into bottom-up scop detection. >> Results can be viewed by: >> pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. >> LLVMPolly-ScopBottomUp.so) -mllvm -polly-optimizer=none -mllvm >> -polly-code-generator=none >> http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median >> pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. >> LLVMPolly-ScopBottomUp.so) -mllvm -polly >> http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median >> (*Both of these results are based on LLVM r187116, which has included >> the r187102 patch file that we discussed above) >> >> Please notice that this patch file will lead to some errors in >> Polly-tests, so the data shown here can not be regards as confident >> results. For example, this patch can significantly reduce the >> compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop >> only >> because it regards the nested loop as an invalid scop and skips all >> following transformations and optimizations. However, I evaluated it >> here to see its potential performance impact. Based on the results >> shown on >> http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median, >> we can see detecting scops bottom-up may further reduce Polly >> compile-time by more than 10%. > >Interesting. For some reason it also regresses huffbench quite a bit. This is because the ScopBottomUp patch file invalids the scop detection for huffbench. The run-time of huffbench with different options are shown as follows: clang: 19.1680s (see runid=14) polly without ScopBottomUp patch file: 14.8340s (see runid=16) polly with ScopBottomUp patch file: 19.2920s (see runid=21) As you can see, the ScopBottomUp patch file shows almost the same execution performance with clang. That is because no invalid scops is detected with this patch file at all. >:-( I think here an up-to-date non-polly to polly comparision would come >handy to see which benchmarks we still see larger performance >regressions. And if the bottom-up scop detection actually helps here. >As this is a larger patch, we should really have a need for it before >switching to it. > I have evaluated Polly compile-time performance for the following options: clang: clang -O3 (runid: 14) pBasic: clang -O3 -load LLVMPolly.so (runid:15) pNoGen: pollycc -O3 -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none (runid:16) pNoOpt: pollycc -O3 -mllvm -polly-optimizer=none (runid:17) pOpt: pollycc -O3 (runid:18) For example, you can view the comparison between "clang" and "pNoGen" with: http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=14&baseline=14 It shows that without optimizer and code generator, Polly would lead to less then 30% extra compile-time overhead. For the execution performance, it is interesting that pNoGen not only significantly improves the execution performance for some benchmarks (nestedloop/huffbench) but also significantly reduces the execution performance for another set of benchmarks (gcc-loops/lpbench). Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rockysui at gmail.com Wed Jul 31 23:24:15 2013 From: rockysui at gmail.com (Yulei Sui) Date: Thu, 1 Aug 2013 16:24:15 +1000 Subject: [LLVMdev] Lower ConstantExprs into instructions Message-ID: Hello, I found an out-of-date ConstantExpressionsLower pass in llvm-1.5, it was removed in later versions. Is there any existing similar pass which converts all constant expressions into instructions for recent releases (e.g. 3.1 or above) ? Thanks Yulei From jfb at google.com Wed Jul 31 23:31:00 2013 From: jfb at google.com (JF Bastien) Date: Wed, 31 Jul 2013 23:31:00 -0700 Subject: [LLVMdev] Intended semantics for ``fence seq_cst`` In-Reply-To: References: Message-ID: > It doesn't really make sense to me. The most likely way for the optimizer > to break any of this is in the middle end. By only fixing it afterward, I > don't see what the advantage of fixing it at all is... > Actually I think you're right: we also transform atomics to stable intrinsics, which we then transform back to LLVM IR on the user-side. Using these intrinsics pre-opt would be detrimental to overall performance, but doing volatile->atomic pre-opt and doing atomic->intrinsic post-opt should be OK. As Jeffrey pointed out, the penalty is relatively low on x86. > Yes, we discussed performance of our approach extensively for x86-32, x86-64, ARM, MIPS and other potential targets. Our current approach isn't the best one for performance but it's definitely conservative, potentially more portable in the face of bugs, while still being fast-ish and allowing us to loosen up in future releases. It seems like a nice tradeoff for a first launch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Milind.Chabbi at rice.edu Mon Jul 29 00:39:01 2013 From: Milind.Chabbi at rice.edu (Milind Chabbi) Date: Mon, 29 Jul 2013 00:39:01 -0700 Subject: [LLVMdev] opt -O3 causes Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed Message-ID: I am hitting an LLVM assertion from the llc tool iff the bitcode file is optimized at -O3 level by opt). -O1 and -O2 levels of opt do not cause this assert. LLVM version 3.4svn DEBUG build with assertions. Built Jul 14 2013 (15:39:08). Default target: x86_64-unknown-linux-gnu Host CPU: amdfam10 I have attached the input bc file before -O3 optimization :bzip2.del.bc.tgz I have attached the input bc file after -O3 optimization : bzip2.del.opt.bc.tgz Command to run on -O3 optimized file: llc -cppgen=program bzip2.del.opt.bc Call stack: llc: Value.cpp:307: void llvm::Value::replaceAllUsesWith(llvm::Value*): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed. 0 llc 0x00000000014c716a llvm::sys::PrintStackTrace(_IO_FILE*) + 38 1 llc 0x00000000014c73d1 2 llc 0x00000000014c7718 3 libpthread.so.0 0x000000333060eb10 4 libc.so.6 0x000000332fa30265 gsignal + 53 5 libc.so.6 0x000000332fa31d10 abort + 272 6 libc.so.6 0x000000332fa296e6 __assert_fail + 246 7 llc 0x00000000014702df llvm::Value::replaceAllUsesWith(llvm::Value*) + 173 8 llc 0x00000000006b91e3 llvm::BitcodeReaderValueList::ResolveConstantForwardRefs() + 1127 9 llc 0x00000000006b9d97 llvm::BitcodeReader::ParseConstants() + 339 10 llc 0x00000000006bc634 llvm::BitcodeReader::ParseFunctionBody(llvm::Function*) + 594 11 llc 0x00000000006c2d63 llvm::BitcodeReader::Materialize(llvm::GlobalValue*, std::string*) + 411 12 llc 0x00000000006c290d llvm::BitcodeReader::MaterializeModule(llvm::Module*, std::string*) + 195 13 llc 0x000000000144898e llvm::Module::MaterializeAll(std::string*) + 78 14 llc 0x00000000014489b7 llvm::Module::MaterializeAllPermanently(std::string*) + 29 15 llc 0x00000000006c367b llvm::ParseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&, std::string*) + 93 16 llc 0x0000000000679302 llvm::ParseIR(llvm::MemoryBuffer*, llvm::SMDiagnostic&, llvm::LLVMContext&) + 214 17 llc 0x000000000067958a llvm::ParseIRFile(std::string const&, llvm::SMDiagnostic&, llvm::LLVMContext&) + 374 18 llc 0x00000000006680f4 19 llc 0x0000000000668fed main + 199 20 libc.so.6 0x000000332fa1d994 __libc_start_main + 244 21 llc 0x0000000000666319 Stack dump: 0. Program arguments: llc -cppgen=program bzip2.del.opt.bc -------------- next part -------------- A non-text attachment was scrubbed... Name: bzip2.del.bc.tgz Type: application/x-gzip Size: 129266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bzip2.del.opt.bc.tgz Type: application/x-gzip Size: 133379 bytes Desc: not available URL: From garg11 at illinois.edu Tue Jul 30 20:26:07 2013 From: garg11 at illinois.edu (Garg, Pranav) Date: Tue, 30 Jul 2013 22:26:07 -0500 Subject: [LLVMdev] Error building compiler-rt Message-ID: <51F883CF.8010707@illinois.edu> Hi, I am trying to build llvm along with clang and compiler-rt. when I run make, I am getting the following compilation error: ... COMPILE: clang_linux/full-x86_64/x86_64:/home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:53:29:error: cast to 'unsigned char *' from smaller integer type 'unsigned int' [-Werror,-Wint-to-pointer-cast] unsigned char* startPage = (unsigned char*)(p & pageAlignMask); ^ /home/pranav/smack-project/llvm/src/projects/compiler-rt/lib/enable_execute_stack.c:54:27:error: cast to 'unsigned char *' from smaller integer type 'unsigned int' [-Werror,-Wint-to-pointer-cast] unsigned char* endPage = (unsigned char*)((p+TRAMPOLINE_SIZE+pageSize) & pageAlignMask); ^ 2 errors generated. ... I am using gcc-4.6.3 on a Ubuntu 12.04.1 LTS. Any ideas as to how to resolve this error? Thanks Pranav