Jump to content
  • Sign in to follow this  

    Crash on shutdown / restart


    madmax
    • Status: Completed
      Main Category: Core / Mangos Daemon
      Sub-Category: Core Crash
      Version: 21.2 Milestone: 21 (Current) Priority: Urgent
      Related to:
      Implemented Version: 0.20

    Server freeze on ace during shtudown / restart

    I know there is already some issues on server shutdown crashes but not a dedicated one to the ace shutdown which is the point of this issue.
    ----------------

    When shutting down Mangosd either via ctrl+c, .server shutdown 0 or .server restart the core currently hangs.

    A version I was using before christmas (from november) the core would show something about ace no error or something similar and hang at that point.

    Now I get no error message but just a core hang. We need to find out what is causing the core hang at the shutdown/restart point.

    Sign in to follow this  


    User Feedback

    Recommended Comments

    Closing with Ctrl+c gives this error:
    [IMG]https://www.getmangos.eu/attachment.php?attachmentid=180[/IMG]


    From visual studio output:
    [QUOTE]The thread 0x10e4 has exited with code 0 (0x0).
    The thread 0x1040 has exited with code 0 (0x0).
    First-chance exception at 0x74F422CB (KernelBase.dll) in mangosd.exe: 0x40010005: Control-C.
    [/QUOTE]

    Share this comment


    Link to comment
    Share on other sites

    This looks like the issues I temp fixed some months ago, which wasn't fully fixed. The use of .server shutdown 0 or .server restart 0. This is from the server unable to use the call function from 0. As I recall, I believe this is on all cores.

    Tips: Try using a high number then 0

    Share this comment


    Link to comment
    Share on other sites

    Made some progress and got some source code to come up for the crash from ace.

    Just putting this link here for antz - [url]http://stackoverflow.com/questions/1702172/terminating-a-thread-gracefully-not-using-terminatethread[/url]

    Share this comment


    Link to comment
    Share on other sites

    Your link contains good info, but we're using ACE, not pure C++, so we are creating and terminating threads using ACE if I recall correctly. The information in your link would have to be applied to the ACE library or we'd have to dump ACE for threading. I am not sure how we would use CreateThread() alongside the ACE function to handle threads. Maybe it can be done and I am wrong, but I normally only do pure C++ for things like threading, so I know very little about ACE and threads.

    *EDIT*

    Also, threading DOES appear broken to an extent. I had been working on realmd not showing realms before my server died and I figured out that the ACE functions we call to create the threads were NOT working. They would create a thread and instantly exit it, resulting in no servers being shown. This is under Debian, not Windows.

    Share this comment


    Link to comment
    Share on other sites

    Here is the assert breaking execution (Win32, internal ACE, OS_NS_Thread.cpp:950):
    ACE_ASSERT (key_info.key_in_use ());
    It seems that a thread, while gracefully finishing, activates and requests a [URL="https://books.google.com.ua/books?id=dz6Fm8brtScC&pg=PA309&lpg=PA309&dq=ace+tss&source=bl&ots=nME3UZsZEX&sig=mQ24DAhA9NcTWiPo3CVcmQ6hBqM&hl=ru&sa=X&ei=PTTVVJbAGcK8ygPM3ICYDg&ved=0CDgQ6AEwAw#v=onepage&q=ace%20tss&f=false"]TSS key[/URL] which is used no more by any thread (but why isn't it flagged correctly?). Would be this an ACE bug, a bugreport probably had been [URL="http://bugzilla.dre.vanderbilt.edu/buglist.cgi?component=ACE%20Core&order=changeddate%20DESC%2Cbug_status%20DESC%2Cpriority%2Cassigned_to%2Cbug_id&product=ACE&query_based_on=&query_format=advanced&resolution=---"]here[/URL]. But its likely an incorrect ACE usage. We need the help of one who created with ACE (Reactor + multithreading) something more complex than "Hello World".

    Btw, one of the most powerful debugging and profiling tools is [URL="http://valgrind.org"]Valgrind[/URL]. It could probably help here even in more-or-less amateur's hands. And here a virtual server would be useful, but only if under Linux.

    Share this comment


    Link to comment
    Share on other sites

    It's been several years, so please forgive the cobwebs and dust as I dredge up lost knowledge, but here's my thought:

    Since this server crash occurs at shutdown and it involves keys created by ACE, my money is on this error having something to do with the server logging. It seems something has gotten mixed up during the cleanup/shutdown process, possibly an incorrect value being returned to ACE by mangosd.

    Before ACE 6.3.0 was committed by Foereaper last month, the previous change was 6.1.7 committed in August. With this error not being reported until December, I'd say that points to something having changed with the server code. The other possibility could be the complications and pitfalls that arise when working with submodules in Git leading to source corruption.

    Has anyone tried compiling ACE with the Dump macros enabled or run MaNGOS in debug mode? Either or both would give more information than the console errors.

    Assuming the server crash doesn't completely prevent logging, try setting the logging level to verbose in the mangosd config. That would at least give an overview of what's happening during the shutdown process.

    Valgrind hasn't been used in years, since nearly all of our devs are Windows programmers, and I know of no comparable replacement for Windows that doesn't cost an arm and a leg. Antz may be able to demonstrate how to use Visual Studio to trace and debug MaNGOS with similar results.

    I agree with Olion's assessment. ACE has been regarded much as a plug-in module you simply add and then forget about it. This library is at the very heart of the server so it is wise for everyone who touches core functions to become familiar with it.

    By the way... When you're reporting a bug, it's status should not be set to "Confirmed" until there is at least one other user replying with a similar experience.

    Share this comment


    Link to comment
    Share on other sites

    This has been confirmed by myself and antz. [U]In future please don't change my issue status[/U], if I set it to confirmed it is confirmed for a reason, in this case has been seen and commented on by users in the dev channel of skype.

    Share this comment


    Link to comment
    Share on other sites

    Nuke, I use Valgrind. It is one of the best in the Linux/Unix/BSD world. I do not know of something comparable on Windows that is free, open-source, or similar.

    As for the versioning, this is why I have continually warned against staying bleeding-edge with ACE and the like. Unless you run Gentoo or something similar, you are NOT going to have the latest ACE stuff. Binary distributions (Debian, Ubuntu, Fedora, etc) update with new releases and possibly once between releases with a backport. This means that staying bleeding-edge can cause some issues in every world except Windows. Sure, you can just grab a binary downlaod in Windows and further bloat the install, but this is not how Linux works. I believe some of our issues would go away if we fix them and then choose a version of ACE to stick with for a period of time before doing a planned upgrade, which should come after testing. Kind of like an extended service release (ESR) like so much software does now.

    Share this comment


    Link to comment
    Share on other sites

    As it currently stands i can not reproduce this bug on linux. Running the latest code from github branch Rel20 with no SD2, no Eluna, no tools built and debug mode enabled. eg:

    [code]cmake .. -DCMAKE_INSTALL_PREFIX=~/src/c++/mangoszero/bin/rel20/ -DSCRIPT_LIB_ELUNA=0 -DSCRIPT_LIB_SD2=0 -DBUILD_TOOLS=0 -DDEBUG=1[/code]

    I've tried with .server restart 0, .server shutdown 0 and CTRL-c, all to no avail. Do players need to be on the server for this to happen? Has anyone else managed to get it happening on linux?

    Share this comment


    Link to comment
    Share on other sites

    Well, after some tinkering with the sources, the following log can be obtained:
    Halting process...
    X
    CliRunnable: run() end! Thread=7084
    Master.Run() before final return, thread=7068
    [B]sMaster returned! Code=0, Thread=7068[/B]
    Dtor: number of threads: 0, last ID=6468, thread=7068
    Dtor: number of threads: 0, last ID=7064, thread=7068
    Destroying singleton class LFGQueue, thread=7068
    Destroying singleton class MassMailMgr, thread=7068
    Destroying singleton class AuctionBotConfig, thread=7068
    Destroying singleton class AuctionHouseBot, thread=7068
    Destroying singleton class TerrainManager, thread=7068
    Destroying singleton class OutdoorPvPMgr, thread=7068
    Destroying singleton class MapManager, thread=7068
    Destroying singleton class ObjectRegistry,enum MovementGeneratorType>, thread=7068
    Destroying singleton class ObjectRegistry,class std::allocator > >,class std::basic_string,class std::allocator > >, thread=7068
    Destroying singleton class CreatureEventAIMgr, thread=7068
    Destroying singleton class GMTicketMgr, thread=7068
    Destroying singleton class GuildMgr, thread=7068
    Destroying singleton class AuctionHouseMgr, thread=7068
    Destroying singleton class BattleGroundMgr, thread=7068
    Destroying singleton class WaypointManager, thread=7068
    Destroying singleton class GameEventMgr, thread=7068
    Destroying singleton class PoolManager, thread=7068
    Destroying singleton class CreatureLinkingMgr, thread=7068
    Destroying singleton class MapPersistentStateManager, thread=7068
    Destroying singleton class ScriptMgr, thread=7068
    Destroying singleton class ObjectMgr, thread=7068
    Destroying singleton class World, thread=7068
    Destroying singleton class Master, thread=7068
    Destroying singleton class Log, thread=7068
    Destroying singleton class Config, thread=7068
    ACE_Thread::setspecific() failed!: No error

    Here, the line in bold is from mangosd/Main.cpp, more precisely, from extern int main(int argc, char** argv). After CliRunnable and WorldRunnable are shut down, the only active thread executes return() from the program body, and the OS calls all the static destructors (in undefined order, though in our case it seems logical). IIRC no substantial work is done in these dtors, mostly memory freeing and pointer setting. Not a big problem if some of the dtors will be ignored, OS will free the memory anyway. So this is a rare case where the symptomatic cure to be applied instead of the correct one, consisting in a major code redesign in an unclear to me way (for that we have no resources, as it looks out). And this is my version of, say, "a crude workaround" rather than "a cure". I ask someone who tried a 64-bit Win build to extend #ifdef correspondingly, if needed.

    [url]https://github.com/mangoszero/server/pull/298[/url]

    Share this comment


    Link to comment
    Share on other sites

    Just found that at a server-type OS (different from my 7) the problem is different (at a stage of joining to the thread just before destroying it) and the "hack" proposed does not help. Trying again...

    Share this comment


    Link to comment
    Share on other sites

    After some... say, experience with a newer Win OS that 7, another simple enough hack was found unexpectedly. The idea is in supplying the ACE project by define _WIN32_WINNT=0x0601. Below is CMakeList file patch which I will have troubles to upload because it belongs to the submodule "dep".
    [ATTACH]191[/ATTACH]

    Share this comment


    Link to comment
    Share on other sites

    The fix for this issue has been [URL="https://github.com/mangoszero/server/pull/60"]PR`ed[/URL]. The problem was the destruction of various static ACE_TSS objects. A great hint was quite the [URL="http://www.dre.vanderbilt.edu/Doxygen/5.7.9/html/ace/a00737.html#_details"]documentation[/URL] (the note)

    Share this comment


    Link to comment
    Share on other sites

    [quote=H0zen]The fix for this issue has been [URL="https://github.com/mangoszero/server/pull/60"]PR`ed[/URL]. The problem was the destruction of various static ACE_TSS objects. A great hint was quite the [URL="http://www.dre.vanderbilt.edu/Doxygen/5.7.9/html/ace/a00737.html#_details"]documentation[/URL] (the note)[/quote]

    Nice one, H0zen :D

    I am going to create an effigy to your awesomeness and show my devotion by sacrificing MadMax before your altar :)
    Or would you prefer the standard virgin maid?

    (moments of madness, priceless) :D

    Keep up the awesome work, matey :D

    Share this comment


    Link to comment
    Share on other sites


    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

Contact Us

To contact us click here
You can also email us at [email protected]

Privacy Policy | Terms & Conditions

Repositories

The Link to the master list
of MaNGOS repositories:
Copyright © getMaNGOS. All rights Reserved.

This website is in no way associated with or endorsed by Blizzard Entertainment®
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use