unsafe destructors

krome

Member
Hi again marco...

I have found another problem in DOA which I feel definately needs to be addressed in the next patch/upgrade. This one has caused all kinds of havoc for us here, until I was finally able to nail down the source of the problem.

We are using DOA 3.4.6 in a multi-tier scenario (MIDAS), using Delphi 5 compiler. Basically, components run on application servers (inside of MTS) which access the Oracle databases, and several tiers of business and GUI components access those server components from other computers. Its actually much more complicated than that, but thats the basics of our setup.

The symptoms we have been seeing are very sporadic... and wouldnt you know it, they didnt surface in Dev/QA environments often enough to draw attention... but in "production" environment, the symptoms occur very frequently. We have to deal with it roughly 5-10 times per day on average, and the servers are not even under significant load yet (10 concurrent users max per server). What happens is that at some seemingly random point, when a server MtsDatamodule is destroyed by COM+ Services, the associated process will suddenly begin to run at 100% CPU utilization, and consume available memory at a rate of ~200k per second... until the server's available virtual memory pool is exhausted a couple hours later and the operating system finally goes belly-up. During this period, COM+ Manager utility shows zero instances of the component. This only happens in MTS modules which make use of DOA components.

We also have much older MIDAS servers which run as executables (these are getting converted to MTS as time permits) and that contain DOA components. These programs have shown a history of "hanging" during automatic shutdown, and I assume its actually due to the same problem we see in the Mts configurations - just manifested in a different way.

Basically, the problem (as best I have been able to determine) is that the destructors in the DOA components are currently not bulletproof enough. I am not sure yet which component exactly is the offender, but it is most likely the TOracleSession or TOracleDataset (it could even be a problem with all DOA components).

The problem occurs when an exception is raised during the overrided destructor of these components. This causes the inherited destructor to never execute, and more importantly, it causes the destructor of the parent datamodule to cease execution, and the remaining component instances never get freed, and neither does the datamodule itself. COM+ Services is in charge of the object destruction at this point, and it thinks the datamodule has been successfully released, however it is still partially instantiated, and the DLL is therefore never allowed to be unloaded. I have no idea what happens at that point, other than we see the process (its actually a dllhost.exe process, the COM+ hosting executable) spiral out of control.

You can very easily replicate the problem by raising an exception in the destructor of any DOA component which is contained by a TMtsDatamodule registered in COM+ Services.

To fix this, we ended up patching TOracleSession, TOracleDataset, and TOracleQuery (the only components from DOA which we use). For now, we have wrapped every section of those destructors with try..except handlers, logging any exceptions to OutputDebugString (but not re-raising the exceptions). This ensures that the destruction sequence is unhindered, even when an error is encountered, and allows the host process to continue operating normally. We intend to run a debugging tool to monitor the output from OutputDebugString in production for a while, and perhaps we can more accurately pinpoint the offender component in the next few weeks.

I really dont know of any other way to correct the problem besides trapping and ignoring errors in the component destructors. At the point this occurs, COM+ Services has decided the component is no longer needed (refcount reached zero), and that it must be destroyed. If an error occurs during the destruction process, it really doesnt care... as long as the destruction was fully completed.

Is it possible to get this corrected in the next patch/upgrade of DOA? Its a significant problem from our perspective, and prevents us from using the components in a production environment without patches. I also can't imagine we would be the only DOA customer experiencing this.
 
We'll try to address this in the next release. Can you send me an example by e-mail that shows how you worked around this problem?

Thanks in advance.

------------------
Marco Kalter
Allround Automations
 
Hi!

I had a problem with TMonitorClient destructor.

When application is started under system account, and under some undefined conditions
from console, access violation exceprion is
raised at
TMonitorClient.Destroy
:004BBCC0 53 push ebx
:004BBCC1 56 push esi
:004BBCC2 E82D78F4FF call BeforeDestruction
:004BBCC7 8BDA mov ebx,edx
:004BBCC9 8BF0 mov esi,eax
:004BBCCB 8B4638 mov eax,[esi+38]
:004BBCCE►8B4010 mov eax,[eax+10]
_________^^^^^^^^^^^^^^^^^^^________________
at this point.

Hope this will help to fix the bug.
Thank you,
--
Regards, Tim
 
I dont think I ever got an email through, but this is the code we are using currently to alleviate the problem. It's by no means a permanent fix, and does not even address the real problem of what is raising the errors (evaluating debug information indicates a firewall is killing database connection during component destruction, and TOracleSession chokes trying to execute LogOff method), but it at least allows the session to be destroyed and sends a little debugging info to an external monitoring tool (like dbwin32 or debugview, anything capable of reading OutputDebugString messages):

Code:
destructor TOracleSession.Destroy;
var i: Integer;
begin
  try
    LogOff;
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'LogOff', E.ClassName, E.Message])));
  end;
  try
    FDBMS_Alert.Free;
    FDBMS_Application_Info.Free;
    FDBMS_Job.Free;
    FDBMS_Output.Free;
    FDBMS_Pipe.Free;
    FUTL_File.Free;
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'Internal Free(s)', E.ClassName, E.Message])));
  end;
  try
    with Queries.LockList do
    try
      for i := Count - 1 downto 0 do TOracleQuery(Items[i]).Session := nil;
    finally
      Queries.UnlockList;
    end;
    with Packages.LockList do
    try
      for i := Count - 1 downto 0 do TOraclePackage(Items[i]).Session := nil;
    finally
      Packages.UnlockList;
    end;
    with Scripts.LockList do
    try
      for i := Count - 1 downto 0 do TOracleScript(Items[i]).Session := nil;
    finally
      Scripts.UnlockList;
    end;
    with Loaders.LockList do
    try
      for i := Count - 1 downto 0 do TOracleDirectPathLoader(Items[i]).Session := nil;
    finally
      Loaders.UnlockList;
    end;
    with CustomPackages.LockList do
    try
      for i := Count - 1 downto 0 do TOracleCustomPackage(Items[i]).Session := nil;
    finally
      CustomPackages.UnlockList;
    end;
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'Session Disconnects', E.ClassName, E.Message])));
  end;
  try
    if Query <> nil then Query.Free;
    Queries.Free;
    DataSets.Free;
    Packages.Free;
    Scripts.Free;
    Loaders.Free;
    CustomPackages.Free;
    Events.Free;
    LOBLocators.Free;
    Objects.Free;
    References.Free;
    CriticalSection.Free;
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'Internal Free(s) 2', E.ClassName, E.Message])));
  end;
  try
    if UseOCI80 and (envhp <> nil) then
    begin
      OCICall(OCIHandleFree(srvhp, OCI_HTYPE_SERVER));
      OCICall(OCIHandleFree(svchp, OCI_HTYPE_SVCCTX));
      OCICall(OCIHandleFree(errhp, OCI_HTYPE_ERROR));
      OCICall(OCIHandleFree(secerrhp, OCI_HTYPE_ERROR));
      OCICall(OCIHandleFree(authp, OCI_HTYPE_SESSION));
      OCICall(OCIHandleFree(envhp, OCI_HTYPE_ENV));
    end;
    OracleTableInfoList.Free;
    FPreferences.Free;
    SendToMonitor(False, True);
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'API Releases', E.ClassName, E.Message])));
  end;
  try
    if AllSessions <> nil then AllSessions.Remove(Self);
    if FMonitorParameters <> nil then FMonitorParameters.Free;
  except
    on E:Exception do
      OutputDebugString(PChar(Format('DOA ERROR! ------------> [%s]%s exception %s: %s'#13#10, ['TOracleSession.Destroy', 'Remove Session', E.ClassName, E.Message])));
  end;
  inherited Destroy;
end;

The actual error message recorded is:

Oracle error ORA-03113 "End of file on communication channel"

[This message has been edited by krome (edited 17 April 2003).]
 
Back
Top