[tools][trace] Add stopping event options to dotnet-trace (#4363)
authorMitchell Hwang <16830051+mdh1418@users.noreply.github.com>
Thu, 2 Nov 2023 23:07:01 +0000 (19:07 -0400)
committerGitHub <noreply@github.com>
Thu, 2 Nov 2023 23:07:01 +0000 (16:07 -0700)
Fixes https://github.com/dotnet/diagnostics/issues/3125

This PR provides users another method to stop a dotnet-trace via a
stopping event similar to that of dotnet-monitor, originally implemented
in https://github.com/dotnet/dotnet-monitor/pull/2557.

Three arguments are added to the `dotnet-trace collect` command to
specify a stopping event:
| Argument | Description |
|----------|----------|
|`--stopping-event-provider-name` | A string, parsed as-is, that will
stop the trace upon hitting an event with the matching provider name.
For a more specific stopping event, additionally provide
`--stopping-event-event-name` and/or `--stopping-event-payload-filter`.
|
| `--stopping-event-event-name` | A string, parsed as-is, that will stop
the trace upon hitting an event with the matching event name. Requires
`--stopping-event-provider-name` to be set. For a more specific stopping
event, additionally provide `--stopping-event-payload-filter`. |
| `--stopping-event-payload-filter` | A string, parsed as
[payload_field_name]:[payload_field_value] pairs separated by commas,
that will stop the trace upon hitting an event with a matching payload.
Requires `--stopping-event-provider-name` and
`--stopping-event-event-name` to be set. |

Note: Though technically `--stopping-event-payload-filter` can be set
without needing a `--stopping-event-event-name`, this may lead to
mismatched payloads should another `TraceEvent` under the same provider
not have that particular payload field name. Until there is a good
reason to stop a trace given a payload filter regardless of the event
name, we require `--stopping-event-event-name` to be set whenever
`--stopping-event-payload-filter` is provided.

To stop a trace at a particular event, dotnet-monitor's
[approach](https://github.com/dotnet/dotnet-monitor/blob/0820b6911f3ac47b6b5ec867ac906699e5c15787/src/Tools/dotnet-monitor/Trace/TraceUntilEventOperation.cs#L47)
using an
[EventMonitor](https://github.com/dotnet/diagnostics/blob/main/src/Microsoft.Diagnostics.Monitoring.EventPipe/EventMonitor.cs)
is adopted. Upon hitting a TraceEvent with a matching ProviderName,
EventName (if specified), and PayloadFilter (if specified), we trigger
dotnet-trace's fallback logic to stop the EventPipeSession before the
EventStream ends.

Note: As the EventStream is being parsed asynchronously, there will be
some events that pass through between the time a trace event matching
the specified stopping event arguments is parsed and the
EventPipeSession is stopped.

In addition, this PR modifies `EventMonitor` to use the
`ClrTraceEventParser` to parse `TraceEvent` objects under the
`Microsoft-Windows-DotNETRuntime` provider, and the
`DynamicTraceEventParser` otherwise. The `ClrTraceEventParser` is
generated to understand the ETW event manifest for
`Microsoft-Windows-DotNETRuntime` events. The previous implementation
defaulting to `DynamicTraceEventParser` would not work on non-Windows
platforms such as OSX which could not parse the payload to populate
`PayloadNames` and `PayloadValue(i)` because there was no manifest
available. On the other hand, Windows is able to locate manifest data
for known events through the OS.

-------------------

## Testing

With an Android App
``` C#
    private static void PrintA()
    {
        Console.WriteLine("A");
        Thread.Sleep(1000);
    }

    ...

    private static void PrintK()
    {
        Console.WriteLine("K");
        Thread.Sleep(1000);
    }

    public static void Main(string[] args)
    {
        Console.WriteLine("Hello, Android!"); // logcat
        PrintA();
        PrintB();
        PrintC();
        PrintD();
        PrintE();
        PrintF();
        PrintG();
        PrintH();
        PrintI();
        PrintJ();
        PrintK();

        while (true)
        {
            Thread.Sleep(100);
        }
    }
```

Running dotnet-dsrouter to connect the diagnostic tooling with the
android device
`./artifacts/bin/dotnet-dsrouter/Debug/net6.0/dotnet-dsrouter android -v
debug`

Tracing with a stopping event args provided
`./artifacts/bin/dotnet-trace/Debug/net6.0/dotnet-trace collect -p 16683
--providers Microsoft-Windows-DotNETRuntime:0x1F000080018:5
--stopping-event-provider-name Microsoft-Windows-DotNETRuntime
--stopping-event-event-name Method/JittingStarted
--stopping-event-payload-filter MethodName:PrintA`

[dotnet-dsrouter_20231024_165648.nettrace.zip](https://github.com/dotnet/diagnostics/files/13147788/dotnet-dsrouter_20231024_165648.nettrace.zip)

There are no `Method/JittingStarted` events with MethodName `PrintC`
through `PrintK` in the `.nettrace`, showing that the trace was indeed
stopped after seeing the `PrintA` event. The events after `PrintA` are
an artifact of the second note above. Below is the JITStats of the
.nettrace opened in perfview, showing that the last method was `PrintB`.

<img width="1128" alt="JITStatsPrintA"
src="https://github.com/dotnet/diagnostics/assets/16830051/1742baf4-b528-43c3-aef3-b00a576f2fb8">

src/Microsoft.Diagnostics.Monitoring.EventPipe/EventMonitor.cs
src/Microsoft.Diagnostics.Monitoring.EventPipe/Microsoft.Diagnostics.Monitoring.EventPipe.csproj
src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs
src/Tools/dotnet-trace/dotnet-trace.csproj

index 147167f5c69a55679bebe8c7c0b7e46938d5013e..5e773d239486010daafd1c5330b98adbad4953cf 100644 (file)
@@ -24,6 +24,8 @@ namespace Microsoft.Diagnostics.Monitoring.EventPipe
         private readonly bool _leaveEventStreamOpen;
         private EventPipeEventSource _eventSource;
 
+        private TraceEventParser _eventParser;
+
         private readonly string _providerName;
         private readonly string _eventName;
 
@@ -78,8 +80,16 @@ namespace Microsoft.Diagnostics.Monitoring.EventPipe
                 token.ThrowIfCancellationRequested();
                 using IDisposable registration = token.Register(() => _eventSource.Dispose());
 
-                _eventSource.Dynamic.AddCallbackForProviderEvent(_providerName, _eventName, TraceEventCallback);
+                if (string.Equals(_providerName, "Microsoft-Windows-DotNETRuntime", StringComparison.Ordinal))
+                {
+                    _eventParser = _eventSource.Clr;
+                }
+                else
+                {
+                    _eventParser = _eventSource.Dynamic;
+                }
 
+                _eventParser.AddCallbackForProviderEvent(_providerName, _eventName, TraceEventCallback);
                 _eventSource.Process();
                 token.ThrowIfCancellationRequested();
             }, token);
@@ -91,7 +101,7 @@ namespace Microsoft.Diagnostics.Monitoring.EventPipe
         /// </summary>
         private void StopMonitoringForEvent()
         {
-            _eventSource?.Dynamic.RemoveCallback<TraceEvent>(TraceEventCallback);
+            _eventParser?.RemoveCallback<TraceEvent>(TraceEventCallback);
         }
 
         private void TraceEventCallback(TraceEvent obj)
index c104003875542dffafc874bdcc1305a0eaa7f0f2..4558b2ed0c9a233b6a0568f5d1f83f44475b77c8 100644 (file)
@@ -40,6 +40,7 @@
   <ItemGroup>
     <InternalsVisibleTo Include="dotnet-monitor" />
     <InternalsVisibleTo Include="dotnet-counters" />
+    <InternalsVisibleTo Include="dotnet-trace" />
     <InternalsVisibleTo Include="DotnetCounters.UnitTests" />
     <InternalsVisibleTo Include="Microsoft.Diagnostics.Monitoring.EventPipe.UnitTests" />
     <InternalsVisibleTo Include="Microsoft.Diagnostics.Monitoring.WebApi" />
index 15eab9c79992e93ddfcee80bb39fe2c0800211c9..720373be6d2b05043a6f9b1ce49de0ee8c9566c9 100644 (file)
@@ -12,6 +12,7 @@ using System.Linq;
 using System.Runtime.InteropServices;
 using System.Threading;
 using System.Threading.Tasks;
+using Microsoft.Diagnostics.Monitoring.EventPipe;
 using Microsoft.Diagnostics.NETCore.Client;
 using Microsoft.Internal.Common.Utils;
 using Microsoft.Tools.Common;
@@ -31,7 +32,7 @@ namespace Microsoft.Diagnostics.Tools.Trace
             }
         }
 
-        private delegate Task<int> CollectDelegate(CancellationToken ct, IConsole console, int processId, FileInfo output, uint buffersize, string providers, string profile, TraceFileFormat format, TimeSpan duration, string clrevents, string clreventlevel, string name, string port, bool showchildio, bool resumeRuntime);
+        private delegate Task<int> CollectDelegate(CancellationToken ct, IConsole console, int processId, FileInfo output, uint buffersize, string providers, string profile, TraceFileFormat format, TimeSpan duration, string clrevents, string clreventlevel, string name, string port, bool showchildio, bool resumeRuntime, string stoppingEventProviderName, string stoppingEventEventName, string stoppingEventPayloadFilter);
 
         /// <summary>
         /// Collects a diagnostic trace from a currently running process or launch a child process and trace it.
@@ -52,8 +53,11 @@ namespace Microsoft.Diagnostics.Tools.Trace
         /// <param name="diagnosticPort">Path to the diagnostic port to be used.</param>
         /// <param name="showchildio">Should IO from a child process be hidden.</param>
         /// <param name="resumeRuntime">Resume runtime once session has been initialized.</param>
+        /// <param name="stoppingEventProviderName">A string, parsed as-is, that will stop the trace upon hitting an event with the matching provider name. For a more specific stopping event, additionally provide `--stopping-event-event-name` and/or `--stopping-event-payload-filter`.</param>
+        /// <param name="stoppingEventEventName">A string, parsed as-is, that will stop the trace upon hitting an event with the matching event name. Requires `--stopping-event-provider-name` to be set. For a more specific stopping event, additionally provide `--stopping-event-payload-filter`.</param>
+        /// <param name="stoppingEventPayloadFilter">A string, parsed as [payload_field_name]:[payload_field_value] pairs separated by commas, that will stop the trace upon hitting an event with a matching payload. Requires `--stopping-event-provider-name` and `--stopping-event-event-name` to be set.</param>
         /// <returns></returns>
-        private static async Task<int> Collect(CancellationToken ct, IConsole console, int processId, FileInfo output, uint buffersize, string providers, string profile, TraceFileFormat format, TimeSpan duration, string clrevents, string clreventlevel, string name, string diagnosticPort, bool showchildio, bool resumeRuntime)
+        private static async Task<int> Collect(CancellationToken ct, IConsole console, int processId, FileInfo output, uint buffersize, string providers, string profile, TraceFileFormat format, TimeSpan duration, string clrevents, string clreventlevel, string name, string diagnosticPort, bool showchildio, bool resumeRuntime, string stoppingEventProviderName, string stoppingEventEventName, string stoppingEventPayloadFilter)
         {
             bool collectionStopped = false;
             bool cancelOnEnter = true;
@@ -159,6 +163,39 @@ namespace Microsoft.Diagnostics.Tools.Trace
 
                 PrintProviders(providerCollection, enabledBy);
 
+                // Validate and parse stoppingEvent parameters: stoppingEventProviderName, stoppingEventEventName, stoppingEventPayloadFilter
+
+                bool hasStoppingEventProviderName = !string.IsNullOrEmpty(stoppingEventProviderName);
+                bool hasStoppingEventEventName = !string.IsNullOrEmpty(stoppingEventEventName);
+                bool hasStoppingEventPayloadFilter = !string.IsNullOrEmpty(stoppingEventPayloadFilter);
+                if (!hasStoppingEventProviderName && (hasStoppingEventEventName || hasStoppingEventPayloadFilter))
+                {
+                    Console.Error.WriteLine($"`{nameof(stoppingEventProviderName)}` is required to stop tracing after a specific event for a particular `{nameof(stoppingEventEventName)}` event name or `{nameof(stoppingEventPayloadFilter)}` payload filter.");
+                    return (int)ReturnCode.ArgumentError;
+                }
+                if (!hasStoppingEventEventName && hasStoppingEventPayloadFilter)
+                {
+                    Console.Error.WriteLine($"`{nameof(stoppingEventEventName)}` is required to stop tracing after a specific event for a particular `{nameof(stoppingEventPayloadFilter)}` payload filter.");
+                    return (int)ReturnCode.ArgumentError;
+                }
+
+                Dictionary<string, string> payloadFilter = new();
+                if (hasStoppingEventPayloadFilter)
+                {
+                    string[] payloadFieldNameValuePairs = stoppingEventPayloadFilter.Split(',');
+                    foreach (string pair in payloadFieldNameValuePairs)
+                    {
+                        string[] payloadFieldNameValuePair = pair.Split(':');
+                        if (payloadFieldNameValuePair.Length != 2)
+                        {
+                            Console.Error.WriteLine($"`{nameof(stoppingEventPayloadFilter)}` does not have valid format. Ensure that it has `payload_field_name:payload_field_value` pairs separated by commas.");
+                            return (int)ReturnCode.ArgumentError;
+                        }
+
+                        payloadFilter[payloadFieldNameValuePair[0]] = payloadFieldNameValuePair[1];
+                    }
+                }
+
                 DiagnosticsClient diagnosticsClient;
                 Process process = null;
                 DiagnosticsClientBuilder builder = new("dotnet-trace", 10);
@@ -220,7 +257,6 @@ namespace Microsoft.Diagnostics.Tools.Trace
                     bool rundownRequested = false;
                     System.Timers.Timer durationTimer = null;
 
-
                     using (VirtualTerminalMode vTermMode = printStatusOverTime ? VirtualTerminalMode.TryEnable() : null)
                     {
                         EventPipeSession session = null;
@@ -281,7 +317,31 @@ namespace Microsoft.Diagnostics.Tools.Trace
                             ConsoleWriteLine("\n\n");
 
                             FileInfo fileInfo = new(output.FullName);
-                            Task copyTask = session.EventStream.CopyToAsync(fs);
+                            EventMonitor eventMonitor = null;
+                            Task copyTask = null;
+                            if (hasStoppingEventProviderName)
+                            {
+                                eventMonitor = new(
+                                    stoppingEventProviderName,
+                                    stoppingEventEventName,
+                                    payloadFilter,
+                                    onEvent: (traceEvent) =>
+                                    {
+                                        shouldExit.Set();
+                                    },
+                                    onPayloadFilterMismatch: (traceEvent) =>
+                                    {
+                                        ConsoleWriteLine($"One or more field names specified in the payload filter for event '{traceEvent.ProviderName}/{traceEvent.EventName}' do not match any of the known field names: '{string.Join(' ', traceEvent.PayloadNames)}'. As a result the requested stopping event is unreachable; will continue to collect the trace for the remaining specified duration.");
+                                    },
+                                    eventStream: new PassthroughStream(session.EventStream, fs, (int)buffersize, leaveDestinationStreamOpen: true),
+                                    callOnEventOnlyOnce: true);
+
+                                copyTask = eventMonitor.ProcessAsync(CancellationToken.None);
+                            }
+                            else
+                            {
+                                copyTask = session.EventStream.CopyToAsync(fs);
+                            }
                             Task shouldExitTask = copyTask.ContinueWith(
                                 (task) => shouldExit.Set(),
                                 CancellationToken.None,
@@ -339,6 +399,11 @@ namespace Microsoft.Diagnostics.Tools.Trace
                                 {
                                     printStatus();
                                 } while (!copyTask.Wait(100));
+
+                                if (eventMonitor != null)
+                                {
+                                    await eventMonitor.DisposeAsync().ConfigureAwait(false);
+                                }
                             }
                             // At this point the copyTask will have finished, so wait on the shouldExitTask in case it threw
                             // an exception or had some other interesting behavior
@@ -455,7 +520,10 @@ namespace Microsoft.Diagnostics.Tools.Trace
                 CommonOptions.NameOption(),
                 DiagnosticPortOption(),
                 ShowChildIOOption(),
-                ResumeRuntimeOption()
+                ResumeRuntimeOption(),
+                StoppingEventProviderNameOption(),
+                StoppingEventEventNameOption(),
+                StoppingEventPayloadFilterOption()
             };
 
         private static uint DefaultCircularBufferSizeInMB() => 256;
@@ -546,5 +614,29 @@ namespace Microsoft.Diagnostics.Tools.Trace
             {
                 Argument = new Argument<bool>(name: "resumeRuntime", getDefaultValue: () => true)
             };
+
+        private static Option StoppingEventProviderNameOption() =>
+            new(
+                alias: "--stopping-event-provider-name",
+                description: @"A string, parsed as-is, that will stop the trace upon hitting an event with the matching provider name. For a more specific stopping event, additionally provide `--stopping-event-event-name` and/or `--stopping-event-payload-filter`.")
+            {
+                Argument = new Argument<string>(name: "stoppingEventProviderName", getDefaultValue: () => null)
+            };
+
+        private static Option StoppingEventEventNameOption() =>
+            new(
+                alias: "--stopping-event-event-name",
+                description: @"A string, parsed as-is, that will stop the trace upon hitting an event with the matching event name. Requires `--stopping-event-provider-name` to be set. For a more specific stopping event, additionally provide `--stopping-event-payload-filter`.")
+            {
+                Argument = new Argument<string>(name: "stoppingEventEventName", getDefaultValue: () => null)
+            };
+
+        private static Option StoppingEventPayloadFilterOption() =>
+            new(
+                alias: "--stopping-event-payload-filter",
+                description: @"A string, parsed as [payload_field_name]:[payload_field_value] pairs separated by commas, that will stop the trace upon hitting an event with a matching payload. Requires `--stopping-event-provider-name` and `--stopping-event-event-name` to be set.")
+            {
+                Argument = new Argument<string>(name: "stoppingEventPayloadFilter", getDefaultValue: () => null)
+            };
     }
 }
index 1beb21c32d2be5966aca2a59a07a95c9e630ee4d..a4ed81dde478a43095e1d9a18e30dcf9d35de45c 100644 (file)
@@ -17,6 +17,7 @@
 
   <ItemGroup>
     <ProjectReference Include="$(MSBuildThisFileDirectory)..\..\Microsoft.Diagnostics.NETCore.Client\Microsoft.Diagnostics.NETCore.Client.csproj" />
+    <ProjectReference Include="$(MSBuildThisFileDirectory)..\..\Microsoft.Diagnostics.Monitoring.EventPipe\Microsoft.Diagnostics.Monitoring.EventPipe.csproj" />
   </ItemGroup>
 
   <ItemGroup>