[fixed] ACE 5.8.3 is not stable?

Auntie Mangos · January 1, 1970

Unfortunately, it has nothing to do with new version.

This issue is pretty ancient, and no one have a lead.

http://getmangos.eu/community/topic/15569/crash-in-new-ace/

Undergarun · December 19, 2010

I am having this issue since some time ago.

http://pastebin.com/cX0bxrjK

Any help is welcome!

Undergarun · December 21, 2010

http://pastebin.com/mB4xb0mv

Anybody knows where the problem is?

Mangos Version: [10899]

Custom Patches: Visibility and reloc by Silverice (branch 2) + mtmaps + Timer system improved (Ambal version)

SD2 Version: [1913] + RSA ICC

Database Name and Version : YTDB 577

qsa · December 21, 2010

Spend some time on this issue by now.

Still nothing.

There are very few access points to msg_queue(), and its data.

I can't figure where ACE_Message_Block (which is element in ACE_Message_Queue) can get corrupted.

It really feels like threading issue.

I don't remember who made this code (migration to ACE lib) in the first place, but we sure could use his help right about now.

Ambal · December 21, 2010

I don't remember who made this code (migration to ACE lib) in the first place, but we sure could use his help right about now.

It was Derex who moved network code on ACE, but he is missing from long ago

Ambal · December 21, 2010

I think this piece of code might be related to your issue:

typedef ACE_Svc_Handler<ACE_SOCK_STREAM, ACE_NULL_SYNCH> WorldHandler;

try to change it to

typedef ACE_Svc_Handler<ACE_SOCK_STREAM, ACE_MT_SYNCH> WorldHandler;

and report if this was helpful

Undergarun · December 22, 2010

Your patch didn't work

The problem still exists changing ACE_NULL_SYNCH to ACE_MT_SYNCH

http://pastebin.com/kamtymv2

Undergarun · December 27, 2010

I did a test compiling without ScriptDev2 and crashed were reduced but still crashing so its a Core problem. [Custom patches: mtmaps (2 threads) + TBB concurrent Vectors + Visibility and reloc (branch 2)]

@qsa Any new information about the crash?

@VladimirMangos & Ambal Any possible fix? You recomend to compile with ACE_MT_SYNCH in mtmaps enviroment?

@anybody Help is welcome!

P.D: I'm gonna continue trying to make mtmaps stable in mangos and continue sending feedback, in order to continue ambal's multithreading progress i recomend to add tbb concurrent vector and this lock patch https://github.com/kero99/mangos/commit/7d8a7c47822a760e7a6b938d140f420688e92fc0 to master branch because Spells updates are not thread safe.

Ambal · December 27, 2010

i recomend to add tbb concurrent vector and this lock patch to master branch because Spells updates are not thread safe.

1) Threading Building Blocks (TBB) library itself needs to be upgraded in mangos - it is almost a year as outdated and lacks tonns of performance and bug fixes.

2) Using the way of doing spell updates as in the Kero99 patch is not very efficient because it requires using global mutex, which can lag your server to the death... I'm preparing another solution "World Tasks" which will allow you to execute all complex and/or thread-unsafe functions in main thread w/o overcomplicating mangos code with locks etc. As an example, player far teleports are already using it in my repo, branch "world_tasks", and server will now not try to teleport player into dungeon if it is impossible. Stay tuned

Undergarun · December 27, 2010

1) Threading Building Blocks (TBB) library itself needs to be upgraded in mangos - it is almost a year as outdated and lacks tonns of performance and bug fixes.
2) Using the way of doing spell updates as in the Kero99 patch is not very efficient because it requires using global mutex, which can lag your server to the death... I'm preparing another solution "World Tasks" which will allow you to execute all complex and/or thread-unsafe functions in main thread w/o overcomplicating mangos code with locks etc. As an example, player far teleports are already using it in my repo, branch "world_tasks", and server will now not try to teleport player into dungeon if it is impossible. Stay tuned

AWESOME!! So multithreading model for mangos would be somethink like 1 main thread in which all thread_unsafe and thread_inplace opcodes are executed by WorldTaskManager and thread_safe opcodes in map::update() calls. Isn't it? Some months ago (Probably a year ago) i saw a patch that makes 1 thread per map and add the possibility to restart threads if an error happen in this thread sending all players in this map to the character selection menu, i think is in similar way on Blizzard realms. Anyway your branch is ready for testing?

qsa · December 27, 2010

That's great news.

I don't know if that going to affect this particular problem tho.

I'm pretty sure it exists even if you don't use any map-threading.

It's impossible to reproduce on local/small sever, since that query is used to buffer packets only when output buffer is full, so it takes some significant load even to make some usage of that section of code. Making it hell for debugging.

I can try lowering m_OutBuffer to force more usage of msg_queue(), but meh... even then its shooting in the dark.

If anyone can get this crash logged on valgrind, that sure would help.

Actually, as a temporary solution, you can increase m_OutBuffer size. If that wont cause problems elsewhere in the code, it will surely lower the queue usage, therefore lowering chance for that particular crash.

Maybe someone will have luck guessing what exactly is wrong. I can't see the problem

VladimirMangos · December 27, 2010

i saw a patch that makes 1 thread per map and add the possibility to restart threads if an error happen in this thread sending all players in this map to the character selection menu,

For same mangosd (for diff threads in same application) this is unsafe. You can't be sure that memory not corrupted because it shared by all threads in application. In retial servers cases you see not thread restart but map _server_ restart. In like case it safe (in way: at map server crash, relogin with last saved version character to main continents map server). We attempt move with multi-threading and etc in way that in final state will allow in clean way add support for cluster map-servers.

Ambal · December 27, 2010

So multithreading model for mangos would be somethink like 1 main thread in which all thread_unsafe and thread_inplace opcodes are executed by WorldTaskManager and thread_safe opcodes in map::update() calls.

Yes, you are right, Undergarun: we will try to put as much thread-safe code processing into Map::Update() as possible. All thread-unsafe or complex functionality will be processed in the main thread. Also, if mangos will become "multimangos" - all outer-world communications/requests will be processed as WorldTasks (login, logout, chat, map transfers etc).

As of reported crashes in networking code, I agree with qsa - this issue is not very easy to reproduce. So try to increase output buffer for packets maybe twice and report if your server stability has increased or not.

qsa · January 4, 2011

Well, increasing buffer sizes didn't do the trick.

While that particular crash "gone" (haven't seen it in along long while) we have different related issue.

EDIT: I was wrong, crash in ACE_Message_Block::total_size_and_length is still there

Even more sinister. Plain freezes. just after adding "WorldSocket::SendPacket enqueue_tail" to log.

Need more research on this one, maybe some asserts

kero99 · January 12, 2011

Derex Finally solved the problem: 48h without ACE crashes :

0ab04bd1b0624640ec3e108bd12508829be51ad6
src/game/WorldSocket.cpp |    1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/game/WorldSocket.cpp b/src/game/WorldSocket.cpp
index 8b59038..667b039 100644
--- a/src/game/WorldSocket.cpp
+++ b/src/game/WorldSocket.cpp
@@ -445,6 +445,7 @@ int WorldSocket::handle_close (ACE_HANDLE h, ACE_Reactor_Mask)
        m_Session = NULL;
    }

+    reactor()->remove_handler(this, ACE_Event_Handler::DONT_CALL | ACE_Event_Handler::ALL_EVENTS_MASK);
    return 0;
}

Original code from: https://github.com/TrinityCore/TrinityCore/commit/45459bed23b78fc619475ee2177e826ae4494cd7

Best regards

Ambal · January 12, 2011

Derex Finally solved the problem

Wooot! Lets hope problems will really go away with this change.

I'm also rewriting internals of our DB code to make it more simple, reliable, utilize less locks + add some performance by using dedicated DB connections for queries and transactions.

Cheers

VladimirMangos · January 13, 2011

Patch + mangos specific realmd changes in [11001]. Thank you. kero99 for pointing to patch and testing.

Feanordev · January 13, 2011

Yes Ambal they are gone - I have had 1 day uptimes before - Now its almost 3 days and still online. No ACE-related crashes since applying that after Derex's commit nice to see it becouse it was really annoying and seemed pretty random.

cyberium · January 14, 2011

I personaly don't understaind this patch.

Can anywone can explain me why it was not stable and what's that new line do?

reactor()->remove_handler(this, ACE_Event_Handler::DONT_CALL | ACE_Event_Handler::ALL_EVENTS_MASK);

For learning purpose

Tanks in advance.

faramir118 · January 14, 2011

The ACE Reactor is an event dispatcher. You can create an event handler to receive network IO events, IO exceptions, timer expiration, and some other stuff.

In order to get the Reactor to dispatch to your handler, you have to call register_handler - you can see this in WorldSocket::open

When mangos closes a socket, it eventually calls delete on the WorldSocket object. This means the reference passed to register_handler is invalid - the next time the reactor tries to access that handler, it crashes. The call to remove_handler prevents this

Sign In

[fixed] ACE 5.8.3 is not stable?

Recommended Posts

Auntie Mangos

Undergarun

Undergarun

qsa

Ambal

Ambal

Undergarun

Undergarun

Ambal

Undergarun

qsa

VladimirMangos

Ambal

qsa

kero99

Ambal

VladimirMangos

Feanordev

cyberium

faramir118

Recently Browsing 0 members

Contact Us

Repositories

Support

Wiki

Browse

Activity

My Activity Streams

Important Information