Jump to content

[fixed] ACE 5.8.3 is not stable?


Auntie Mangos

Recommended Posts

  • 40 years later...

Spend some time on this issue by now.

Still nothing.

There are very few access points to msg_queue(), and its data.

I can't figure where ACE_Message_Block (which is element in ACE_Message_Queue) can get corrupted.

It really feels like threading issue.

I don't remember who made this code (migration to ACE lib) in the first place, but we sure could use his help right about now.

Link to comment
Share on other sites

I did a test compiling without ScriptDev2 and crashed were reduced but still crashing so its a Core problem. [Custom patches: mtmaps (2 threads) + TBB concurrent Vectors + Visibility and reloc (branch 2)] :(

@qsa Any new information about the crash?

@VladimirMangos & Ambal Any possible fix? You recomend to compile with ACE_MT_SYNCH in mtmaps enviroment?

@anybody Help is welcome!

P.D: I'm gonna continue trying to make mtmaps stable in mangos and continue sending feedback, in order to continue ambal's multithreading progress i recomend to add tbb concurrent vector and this lock patch https://github.com/kero99/mangos/commit/7d8a7c47822a760e7a6b938d140f420688e92fc0 to master branch because Spells updates are not thread safe.

Link to comment
Share on other sites

i recomend to add tbb concurrent vector and this lock patch to master branch because Spells updates are not thread safe.

1) Threading Building Blocks (TBB) library itself needs to be upgraded in mangos - it is almost a year as outdated and lacks tonns of performance and bug fixes.

2) Using the way of doing spell updates as in the Kero99 patch is not very efficient because it requires using global mutex, which can lag your server to the death... I'm preparing another solution "World Tasks" which will allow you to execute all complex and/or thread-unsafe functions in main thread w/o overcomplicating mangos code with locks etc. As an example, player far teleports are already using it in my repo, branch "world_tasks", and server will now not try to teleport player into dungeon if it is impossible. Stay tuned :)

Link to comment
Share on other sites

1) Threading Building Blocks (TBB) library itself needs to be upgraded in mangos - it is almost a year as outdated and lacks tonns of performance and bug fixes.

2) Using the way of doing spell updates as in the Kero99 patch is not very efficient because it requires using global mutex, which can lag your server to the death... I'm preparing another solution "World Tasks" which will allow you to execute all complex and/or thread-unsafe functions in main thread w/o overcomplicating mangos code with locks etc. As an example, player far teleports are already using it in my repo, branch "world_tasks", and server will now not try to teleport player into dungeon if it is impossible. Stay tuned :)

AWESOME!! So multithreading model for mangos would be somethink like 1 main thread in which all thread_unsafe and thread_inplace opcodes are executed by WorldTaskManager and thread_safe opcodes in map::update() calls. Isn't it? Some months ago (Probably a year ago) i saw a patch that makes 1 thread per map and add the possibility to restart threads if an error happen in this thread sending all players in this map to the character selection menu, i think is in similar way on Blizzard realms. Anyway your branch is ready for testing?

Link to comment
Share on other sites

That's great news.

I don't know if that going to affect this particular problem tho.

I'm pretty sure it exists even if you don't use any map-threading.

It's impossible to reproduce on local/small sever, since that query is used to buffer packets only when output buffer is full, so it takes some significant load even to make some usage of that section of code. Making it hell for debugging.

I can try lowering m_OutBuffer to force more usage of msg_queue(), but meh... even then its shooting in the dark.

If anyone can get this crash logged on valgrind, that sure would help.

Actually, as a temporary solution, you can increase m_OutBuffer size. If that wont cause problems elsewhere in the code, it will surely lower the queue usage, therefore lowering chance for that particular crash.

Maybe someone will have luck guessing what exactly is wrong. I can't see the problem :(

Link to comment
Share on other sites

i saw a patch that makes 1 thread per map and add the possibility to restart threads if an error happen in this thread sending all players in this map to the character selection menu,

For same mangosd (for diff threads in same application) this is unsafe. You can't be sure that memory not corrupted because it shared by all threads in application. In retial servers cases you see not thread restart but map _server_ restart. In like case it safe (in way: at map server crash, relogin with last saved version character to main continents map server). We attempt move with multi-threading and etc in way that in final state will allow in clean way add support for cluster map-servers.

Link to comment
Share on other sites

So multithreading model for mangos would be somethink like 1 main thread in which all thread_unsafe and thread_inplace opcodes are executed by WorldTaskManager and thread_safe opcodes in map::update() calls.

Yes, you are right, Undergarun: we will try to put as much thread-safe code processing into Map::Update() as possible. All thread-unsafe or complex functionality will be processed in the main thread. Also, if mangos will become "multimangos" - all outer-world communications/requests will be processed as WorldTasks (login, logout, chat, map transfers etc).

As of reported crashes in networking code, I agree with qsa - this issue is not very easy to reproduce. So try to increase output buffer for packets maybe twice and report if your server stability has increased or not.

Link to comment
Share on other sites

Well, increasing buffer sizes didn't do the trick.

While that particular crash "gone" (haven't seen it in along long while) we have different related issue.

EDIT: I was wrong, crash in ACE_Message_Block::total_size_and_length is still there :(

Even more sinister. Plain freezes. just after adding "WorldSocket::SendPacket enqueue_tail" to log.

Need more research on this one, maybe some asserts :(

Link to comment
Share on other sites

  • 2 weeks later...

Derex Finally solved the problem: 48h without ACE crashes :

0ab04bd1b0624640ec3e108bd12508829be51ad6
src/game/WorldSocket.cpp |    1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/game/WorldSocket.cpp b/src/game/WorldSocket.cpp
index 8b59038..667b039 100644
--- a/src/game/WorldSocket.cpp
+++ b/src/game/WorldSocket.cpp
@@ -445,6 +445,7 @@ int WorldSocket::handle_close (ACE_HANDLE h, ACE_Reactor_Mask)
        m_Session = NULL;
    }

+    reactor()->remove_handler(this, ACE_Event_Handler::DONT_CALL | ACE_Event_Handler::ALL_EVENTS_MASK);
    return 0;
}

Original code from: https://github.com/TrinityCore/TrinityCore/commit/45459bed23b78fc619475ee2177e826ae4494cd7

Best regards

Link to comment
Share on other sites

Derex Finally solved the problem

Wooot! :) Lets hope problems will really go away with this change.

I'm also rewriting internals of our DB code to make it more simple, reliable, utilize less locks + add some performance by using dedicated DB connections for queries and transactions.

Cheers :)

Link to comment
Share on other sites

The ACE Reactor is an event dispatcher. You can create an event handler to receive network IO events, IO exceptions, timer expiration, and some other stuff.

In order to get the Reactor to dispatch to your handler, you have to call register_handler - you can see this in WorldSocket::open

When mangos closes a socket, it eventually calls delete on the WorldSocket object. This means the reference passed to register_handler is invalid - the next time the reactor tries to access that handler, it crashes. The call to remove_handler prevents this

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use