Closed Bug 565452 Opened 14 years ago Closed 14 years ago

Lots of seemingly unnecessary text change events; very slow code to generate them

Categories

(Core :: Disability Access APIs, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jcblake, Unassigned)

References

(Blocks 1 open bug, )

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100403 Firefox/3.6.3 (Palemoon/3.6.3)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100511 Minefield/3.7a5pre ID:20100511040317

This 'javascript raytracer' utilizes javascript, but is actually more of a <div> element torture test, as it uses around 59,000 <div> elements to render its scene. Loading the page and performing a "Full Render" with the latest Minefield nightly results in wildly differing performance, depending on  whether (I suspect) Minefield is run in a 64-bit/WOW64 environment or native 32-bit.

On WinXP (running in a VM no less) I get a full render in just under 15 seconds.
On Win7 Business 64-bit, the same full render takes well over 450 seconds to complete.


http://nontroppo.org/timer/progressive_raytracer.html

Reproducible: Always

Steps to Reproduce:
1. Install Minefield 3.7a5pre nightly with fresh profile on Win7 64-bit. Run.
2. Load URL in question: http://nontroppo.org/timer/progressive_raytracer.html
3. Press "Full Render" button. (You must fully load/refresh the page between runs to get reproducible score)
4. Note performance score in top right corner of page.
5. Repeat steps 1-4 on WinXP 32-bit.

Actual Results:  
On Win7 64-bit, a full render takes over 450 seconds, on WinXp 32-bit, score is under 15 seconds.


Expected Results:  
Similar scores for both OSes.
Version: unspecified → Trunk
If you load the page in a hidden tab (i.e., have another tab open in the same window, and make the other one visible), do you see the same performance difference?
Yes. That doesn't seem to make a difference.
It's odd that you're seeing the long render time in a trunk build.  See bug 424715, where the trunk work to make this fast happened...

Just to make sure, you were definitely seeing this on trunk?

If so, then some other obvious questions:

1)  Do you have a pen input device on the Win7 box?
2)  Do you have Kaspersky Antivirus installed on the Win7 box?

If the answer to either of those is "yes" (or heck, just in any case), what happens if you set the "accessibility.win32.force_disabled" boolean preference to true?
It is definitely a trunk build: Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100511 Minefield/3.7a5pre

No pen input and no AV products are installed, but I set the preference you mentioned anyway and now it completes in 12 seconds. Problem solved, but I'm scratching my head at that one.
OK.  So _something_ on your system is enabling our accessibility module with its ensuing molasses-behavior...  I'll try to find time to profile this, but my money is that this is more of bug 538594.

That leaves the question of what exactly calls into a11y in your case.  David, is there any sane way to determine that?
Blocks: a11yperf
Component: Layout → Disability Access APIs
Depends on: a11yperfcoalesce
QA Contact: layout → accessibility-apis
And confirming.  David, see question in comment 5 in case you don't get mail for UNCO bugs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I killed all non-essential Windows services and set "accessibility.win32.force_disabled" back to false, restarted Minefield and got fast rendering. So I will spend a couple days checking these one by one to see if I can pinpoint the program that's enabling "A11y".
That would be much appreciated.  Thank you!
It's Roboform. When "C:\Program Files (x86)\Siber Systems\AI RoboForm\RoboTaskBarIcon.exe" is running, it renders slowly. When I close it, it renders fast. I'm running version 7.0.4 Pro (which is a beta).

To be clear, I do not believe it has to do with the extension that Roboform installs automatically into each firefox profile, as I confirmed that the "AI Roboform Toolbar for Firefox" extension remains "disabled" both with and without RoboTaskBarIcon.exe running.
Thank you. So it's dupe of bug 521973.
That appears to be the case.
Perhaps we should also be investigating why enabling Accessibility is a 30x slowdown, and trying to fix it?
(In reply to comment #12)
> Perhaps we should also be investigating why enabling Accessibility is a 30x
> slowdown, and trying to fix it?

Yes, it makes sense. This bug is blocking a11yperf bug, so we can keep it open until investigation is done.
OK.  So I profiled this on Mac after enabling accessibility.

This has nothing to do with bug 538594; it's a totally different issue.

You can even see it on the "Basic Render" test, which is "only" 5x slower with a11y on.

On the "Basic Render" test we spend 72% of our time under nsAccEventQueue::Flush.  On the "Full Render" it's closer to 93%.  All of that time is actually spent under CreateTextChangeEventForNode, and in particular under nsHyperTextAccessible::DOMPointToHyperTextOffset.  The profile is actually simple enough to paste in its entirety.  "Self" is the percentage of time under DOMPointToHyperTextOffset spent in the function itself.  "Total" is the time spent in the function and other functions it calls; I blamed various cycle collector stuff and addref/release to callers to make it clearer what's going on.  I also blamed nsComptr_base::assign_from_qi on callers, which is why GetRole has a large self number there; in reality something like 20% of total time is in (not under; that's bigger!) assign_from_qi.

  Self        Total
  4.4%        100.0% nsHyperTextAccessible::DOMPointToHypertextOffset
 18.9%         94.4%   nsAccUtils::TextLength
 35.4%         75.4%     nsAccessible::GetRole
 16.5%         18.8%       nsHTMLDivElement::QueryInterface
  7.6%         15.1%       nsAccessNode::IsDefunct
  6.1%          6.1%       nsCOMPtr_base::~nsCOMPtr_base()

There's no text involved here.  Why are we even ending up in this code, much less what looks like a bazillion times?  assign_from_qi is not a very long function.

Those QIs are coming from GetRoleInternal, by the way.  It seems like we're QIing to nsIContent so we can check for IsXLink.  Why don't we just switch this code from nsIDOMNode to nsINode?  And replace the IsXLink test with something sane (e.g. the same sort of test for link-iness that style system uses)?

Other questions:

1)  How often does the role of an nsIAccessible change?  Do we already need
    to detect that?  If so, should the role just be a member instead of a
    slow-to-compute function?
2)  Is there a sane way to get rid of that nasty loop in
    DOMPointToHypertextOffset?
No longer depends on: a11yperfcoalesce
Summary: Rendering of large number of <div> elements very slow on Windows 7 64-bit (WOW64) → Lots of seemingly unnecessary text change events; very slow code to generate them
(In reply to comment #14)
 
> On the "Basic Render" test we spend 72% of our time under
> nsAccEventQueue::Flush.  On the "Full Render" it's closer to 93%.  All of that
> time is actually spent under CreateTextChangeEventForNode, 

I have an idea how to improve CreateTextChangeEventForNode at least in two times. I'll come with the patch soon.

>   Self        Total
>   4.4%        100.0% nsHyperTextAccessible::DOMPointToHypertextOffset
>  18.9%         94.4%   nsAccUtils::TextLength

recently I landed patch that get rids excess query interface from TextLength(). Also I filed patch in bug 541618 that removes lot of excess query interfaces. The things must be much better soon. 

> Other questions:
> 
> 1)  How often does the role of an nsIAccessible change?  Do we already need
>     to detect that?  If so, should the role just be a member instead of a
>     slow-to-compute function?

Theoretically role changing should mean accessible recreation because there's no way to notify AT the role was changed but this is primary thing they rely on. But I afraid it doesn't happen always in practice. I would be very careful to change this right now since it might require significant amount of time to ensure we do not regress AT.

However I think we can get rid nsAccUtils::Role() (http://mxr.mozilla.org/mozilla-central/source/accessible/src/base/nsAccUtils.h#265) and replace it by plain nsAccessible::Role() method if it helps.

> 2)  Is there a sane way to get rid of that nasty loop in
>     DOMPointToHypertextOffset?

I wish. That's really huge problem for a11y preformance. I'll file a bug and describe what's this method about and hopefully you get some ideas how to improve this. Unfortunately today I don't have anything really good.

Thank you much for debugging and help.
> recently I landed patch that get rids excess query interface from TextLength().

Note that the fundamental problem here is not just that TextLength is slow (which it may or may not be) but that the underlying algorithm calls it O(N) times and it itself is at least O(N).  So we lose.  Changing the constant is good, but can we avoid the bad algorithmic behavior too?

Please mark any bugs that are relevant as blocking this one (including the one you file on improving CreateTextChangeEventForNode)?

> I'll file a bug and describe what's this method about

Sounds great.  Again, blocking this one?
(In reply to comment #17)
> > recently I landed patch that get rids excess query interface from TextLength().
> 
> Note that the fundamental problem here is not just that TextLength is slow
> (which it may or may not be) but that the underlying algorithm calls it O(N)
> times and it itself is at least O(N).  So we lose.  Changing the constant is
> good, but can we avoid the bad algorithmic behavior too?

Yes, sure. I realize TextLenght isn't bottle neck, just all changes done can give more friendly output of perf tool I hope.

> Please mark any bugs that are relevant as blocking this one (including the one
> you file on improving CreateTextChangeEventForNode)?
> 
> > I'll file a bug and describe what's this method about
> 
> Sounds great.  Again, blocking this one?

Yes, I'll do.
Depends on: 570710
(In reply to comment #16)
> > 2)  Is there a sane way to get rid of that nasty loop in
> >     DOMPointToHypertextOffset?
> 
> I wish. That's really huge problem for a11y preformance. I'll file a bug

done, bug 575052. Try server build containing patch from bug 575052 (and few ones additionally) shows 20 seconds (with enabled a11y) against 18 seconds (without enabled a11y) for full render test on my machine.
Depends on: 575052
There we go!  That's a lot more like it.
I tested this test case again with the latest nightly (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b5pre) Gecko/20100822 Minefield/4.0b5pre) and even with accessibility off, it's rendering in about 276 seconds on the same system -- about a 20x performance regression from what I've seen it do. No longer looks to be an ally-related slow down. Should I file a new bug?
I think this makes sense if a11y is not a cause of the slowness. So please file new bug and refer to this bug there. 

Marked fixed by bug 570500.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
> Should I file a new bug?

You're seeing bug 585258, almost certainly.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: