Perl - подпишись и учись!

  Все выпуски  

Perl - подпишись и учись! - Производительность Perl в сборках от RedHat


    Приветствую всех!(А именно 9867 подписчиков)

 

 Сегодня о неприятном... :(

В сборках perl от редхэта нашли багу, которая значительно понижала производительность в программах\модулях, которые используют bless.
Вкратце, попробуйте выполнить следующую программку:

#!/usr/bin/perl
use overload q(<) => sub {};
my %h;
for (my $i=0; $i<50000; $i++) {
 $h{$i} = bless [ ] => 'main';
 print STDERR '.' if $i % 1000 == 0;
}


Если выполнялось меньше секунды, значит все ок - иначе - ваш перл надо обновить.
Обидно, что в исходниках - это было исправлено еще в 2007ом,.. а в сборках от RedHat это не поправили.
В результате, Perl'у попортили имидж, так сказать Печаль
Обнаружилос или нет это у Вас - просьба отписать здесь: http://forum.perl.dp.ua/cgi-bin/YaBB.pl?num=1220349336

P.s. На своем продакшене под CentOS и perl 5.8.8 - такую багу обнаружили - уже обновили - теперь все ок.


 Источник: http://blog.vipul.net/2008/08/24/redhat-perl-what-a-tragedy/

Redhat perl. What a tragedy.

Published by vipul on August 24, 2008 10:48 pm under Performance, Perl

At my new startup, Slaant, we use a lot of perl.  We use perl for parsing massive amount of HTML/XML documents, we’ve written a homegrown RDF store in perl and we have a set of web applications built on Catalyst.  We use OS X boxes for development and Centos 5.2 for production.  Last week we deployed new hardware - over 150 cores of CPU - expecting much higher data processing throughput along the pipeline of 30 or so perl sub-systems - but the performance boost seemed marginal and perl stood out as the culprit.  This was over a year of work, running on production scale hardware for the first time, and it wasn’t measuring up.  We almost threw perl out of the window.

We have one lone FreeBSD box in our production environment and I happened to notice that a perl program that read JSON structures from disk files was over 100x faster on this box compared to our Centos boxes.  It should’ve been bottlenecked on disk I/O, but strace showed it was burning userland CPU.  Surprised, I ran the fantastic Devel::NYTProf and discovered the most expensive call, by a big margin, was a “bless”.   bless!?!  Perl will happily do millions of blesses a second on my 2Ghz macbook.  And this was a dual-2.5gz quad core server.  What the hell?

Some investigation revealed that there’s a long standing bug in Redhat Perl that causes *severe* performance degradation on code that uses the bless/overload combo.  The thread on this is here: https://bugzilla.redhat.com/show_bug.cgi?id=379791

In the thread, ritz posted the following snippet.  Try it.  It should take under a second if the perl is not broken and a lot longer if it is.

#!/usr/bin/perl
use overload q(<) => sub {};
my %h;
for (my $i=0; $i<50000; $i++) {
$h{$i} = bless [ ] => 'main';
print STDERR '.' if $i % 1000 == 0;
}

There isn’t official fix yet, but there’s a patch in the thread.  We applied the patch.  However, it did not make the problem go away, just delayed it - perl processes using bless/overload start slowing down (and continue to do so exponentially) after a while.  At this point, I decided to recompile perl from source.  The bug was gone.  And the difference was appalling.  Everything got seriously fast.  CPUs were chilling at a loadavg below 0.10 and we were processing data 100x to 1000x faster!  I was giddy.  This was insane. We’d given up on one of the processes - to parse about 25M HTML documents using a HTML::TreeBuilder::XPath parser - because calculations showed that it would take over a year to parse them all.  We assumed the Tree::XPathEngine was somehow intrinsically slow - so we’d rewrite our parsers using regexen at some point.  With the new perl, we parsed these documents in 2 days.  2 days, instead of 365 days.

Rather massively blown away by this, I started sending the snippet above to various companies and projects I am involved with that use a lot of perl on Redhat or related distributions. It turns out many of them are running the broken perl and some of them had spent considerable amount of money and time in optimizing their perl code and infrastructure to work around the performance issue. It also turns out the issue exists on perl that comes with Fedora 9 - even if you compile perl Fedora 9 source package.

So, wow.  How many people might be affected by this?

I realized that anyone running perl code with the distribution perl interpretter on Redhat 5.2, Centos 5.2 or Fedora 9 is likely a victim. Yes, even if your code doesn’t use the fancy bless/overload idiom, many CPAN modules do!  This google search shows 1500+ modules use the bless/overload idiom and they include some really popular ones like URI, JSON.

According to this google trends analysis Redhat, Centos and Fedora make up the majority of linux distributions used in production. All these have a broken perl. How much time and money has been lost because of this?  I have a sinking feeling that it is a staggering number.  I also have a sinking feeling that many people have moved away from perl to python/ruby/java/C because this bug caused them to assume “perl is slow”.  I am hoping this issue will get more visibility because it’s silently killing perl’s reputation and resulting in some very serious wastage of resources.

UPDATE: Nicholas Clark, perl core developer, explains the background and points out that fixes have been available since November 2007, they just haven’t made it into RedHat packages.


В избранное