Jump to content

[Question] How can I read the CPU usage for a specific code part?


Guest crashuncle

Recommended Posts

Hello Mangos Community,

I found one of the major lag reasons on my Server which exist on Windows and Linux. I saw it clearly now in the Process explorer when I added a debug info when chars login (So I see only logins / errors in the console).

The CPU spike up each time a character login, its quite similiar (bit less) with a empty char DB so I guess its a Software problem. What I want to do is adding the CPU usage between every function of the login-phase to see which function need so much CPU usage / stuck the Server. (Is there a easy way to read it?)

Its maybe a quite primitive debugging but im pretty sure that it will work, I just want to get finally ride of this laggs :)

Thanks in advance,

Crashuncle

Link to comment
Share on other sites

The easiest I know, but still not very easy, way is to use a profiler like Rational Quantify / callgrind / DevPartner profiler (don't remember the exact name) / gprof etc ..

That sounds interesting, but I have no idea where I should start. Would this solution be supported by Windows and Linux?

Link to comment
Share on other sites

The biggest problem I've had with trying to profile mangos is the fact that you need a lot of testers to see what is really the problem. What may be the "slowest" part of the code for 5-10 testers may be completely different than the "slowest" part for 500-1000 testers. It also depends how spread out the players are, if all grids are loaded, how many creatures are loaded, how many spells are casting, etc.

Here is a picture of profiling I did...well a long time ago.

http://i30.photobucket.com/albums/c308/thentsc/profile_08_08.png

This is an example of what profilers will output. You have to understand what this information means in order to use it for anything. In this case it is showing that __allshl is taking up 19.75% of recorded execution time. This was caused by a bit shift on a 64 bit value. I don't remember exactly how it was changed but the output after the change looked like this.

http://i30.photobucket.com/albums/c308/thentsc/profile_09_08.png

Looks like a good preformance increase though you'll notice there are far less samples taken in the second profile. What if the testers weren't doing the same things as before? When it comes to profiling you really should use machines to replicate the exact circumstances for each profile (which may be nearly impossible with MaNGOS).

Link to comment
Share on other sites

The biggest problem I've had with trying to profile mangos is the fact that you need a lot of testers to see what is really the problem. What may be the "slowest" part of the code for 5-10 testers may be completely different than the "slowest" part for 500-1000 testers. It also depends how spread out the players are, if all grids are loaded, how many creatures are loaded, how many spells are casting, etc.

Here is a picture of profiling I did...well a long time ago.

http://i30.photobucket.com/albums/c308/thentsc/profile_08_08.png

This is an example of what profilers will output. You have to understand what this information means in order to use it for anything. In this case it is showing that __allshl is taking up 19.75% of recorded execution time. This was caused by a bit shift on a 64 bit value. I don't remember exactly how it was changed but the output after the change looked like this.

http://i30.photobucket.com/albums/c308/thentsc/profile_09_08.png

Looks like a good preformance increase though you'll notice there are far less samples taken in the second profile. What if the testers weren't doing the same things as before? When it comes to profiling you really should use machines to replicate the exact circumstances for each profile (which may be nearly impossible with MaNGOS).

I added already about 200 Debug lines to the Login-Codes in the Character Handler, that show me clearly that the Server lag when a character login. As long as no character login there are no lags (I simply disabled the realm server and it run for hours without a lag).

So there can be only a few possibility's...

-> Somewhere in the Character Code is a bug which make such lags / There is a bug in the character database (I tried a empty Char DB and got similiar lags, I even tried a SQL Server on a different machine)

-> Its related with the Map Loading (But then it shouldn't stop lagging when nobody login)

So its easy to replicate the exact circumstance by simply turning on the Server and let players login for about 15 minutes it lag without a break, we speak here not of small lags I mean lags which are up to 1 minute. Some even get disconnected cause they kinda "timeout".

As long as 1 player login its "ok" but when 2-3 ... 20-30 login at a similiar time then you can forget the Server for a while. So my idea was to add simply more debug lines which show how much CPU each function need. I added already lines which check the time difference between each function and it said me nothing except that a login take 10ms minimum and about 140 seconds maximum (Which are 140 seconds too much).

A lot of time take for example the guilds when the Server bootup or when the chars load itself, but as I said there are similiar problems even with a empty char database. On Linux and on Windows.

So there is no easy possibility to measure the CPU usage between specific code parts? I saw code examples on Google which I tried, but the problem was that it only showed the CPU usage before the login. This profile solution would be of course very useful, I use already a process explorer to see the single threads and some more information's. But when it show stuff like the "ntdll.dll" need X CPU then it don't help me much.

Link to comment
Share on other sites

Im wondering... Is the thread accessing the DB and the main worker thread the same? in that case, the thread would block whenever it has to access the db like...when chars log in they have to be loaded etc etc... This would put the entire server on hold, and blimey if there isnt some funky thread.sleep involved in that sequence :)

Link to comment
Share on other sites

Im wondering... Is the thread accessing the DB and the main worker thread the same? in that case, the thread would block whenever it has to access the db like...when chars log in they have to be loaded etc etc... This would put the entire server on hold, and blimey if there isnt some funky thread.sleep involved in that sequence :)

I think so, because when the Server run for example a query which remove a invalid spell from all chars then the Server stuck for 2-3 minutes (cause of a database query). That is a real problem when the database Server is busy, but I don't think that this is the reason for the login lags which I have cause they even appeared when I tried a local DB Server.

Link to comment
Share on other sites

So there is no easy possibility to measure the CPU usage between specific code parts? I saw code examples on Google which I tried, but the problem was that it only showed the CPU usage before the login. This profile solution would be of course very useful, I use already a process explorer to see the single threads and some more information's. But when it show stuff like the "ntdll.dll" need X CPU then it don't help me much.

Nope, not really. On Windows XP (which is what I used durring that profile) Ntdll.dll is the "system idle process". You WANT that to eat up more CPU cycles as it means your program is spending more time idle. Kernal is generally your gui / windows, and the other "unknown dlls" are pretty easy to figure out.

Profilers are able to see across threads by the way. They can measure which threads are eating up the most cpu time etc.

Link to comment
Share on other sites

is there any indexing enabled for the db whatsoever? cuz that would explain some of the crappy select/update performance...

Profilers are able to see across threads by the way. They can measure which threads are eating up the most cpu time etc.

not entirely sure how native profilers work tbh, although im sure they arent as revealing as the .net ones, which can benchmark the amount of time the application spend on each method hehe

Link to comment
Share on other sites

is there any indexing enabled for the db whatsoever? cuz that would explain some of the crappy select/update performance...
Yes we do use indexing everywhere .. we're actually a lot smarter than you think.
not entirely sure how native profilers work tbh, although im sure they arent as revealing as the .net ones, which can benchmark the amount of time the application spend on each method hehe
If you don't know what you're talking about then how can you be sure ? All of the profilers I mentioned give ample information on method timing , even the time spent on each line in the method ! Some can even tell you exactly how much a method took when it was called by a specific function .. not just average execution times.
Link to comment
Share on other sites

Visual Studio Team Suit's profiler is pretty detailed. It provides you with 2 basic methods.

Sampling - A profile technique that involves taking snap shots at preset intervals of whatever format you want (CPU cycles, MsTime, PreformanceCounters, FPU FLOPS, whatever you want really). You can do it in whatever intervals you want as well. Every 50,000 Cycles to every 5 billion.

Instrumentation - Inserts diagnostic probes into code. Collects A LOT of detailed data. Like seriously more information that you'll know what to do with.

Native Profilers have been around for a long time. I'd be surprised if there were less options in a native profiler than there are in .NET ones considering that .NET is eventually handled with native code.

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use