This document provides a comprehensive look into managing incidents using AI within Atlassian Service Collection. It showcases how alerts are processed, classified, and managed to optimize the response and resolution process, ensuring efficiency and reducing alert fatigue for responders.
Begin by opening the Alert queue within the Atlassian Service Collection to access the AI-powered incident management interface.

Observe the alerts displayed. These are organized based on similarity in the AI view, with an additional classification for each alert.

The classification model, using historical data and general information, assesses which alert groups require immediate responder attention. You have the option to confirm or reject the classification as signal or noise.

The model improves over time as responders classify unclassified groups and accept suggested classifications. Alerts are de-duplicated within groups to notify the responder only once, indicating which groups need their attention.

Consider the example of an alert group with 20 alerts. Instead of 20 notifications, responders receive a single notification, mitigating alert fatigue. For instance, although classified as P2, this group is likely noise due to scheduled maintenance.

If a group indicates a potential signal, such as high CPU utilization, it's crucial for responders to review it closely. Proceed to explore this alert group further.

Upon entering, you'll find AI-generated suggestions powered by ROVO to assist responders in resolving the alert group more efficiently.

The system recommends responders based on past similar incidents and suggests reviewing the history of alert groups. Each individual alert can be examined for more details.

When creating an incident, AI assists by providing a pre-filled summary and description.

AI further suggests an incident priority based on similar past incidents, leveraging the teamwork graph, Atlassian's data layer encompassing team work, goals, and knowledge.

Once the incident is created, review it for accuracy and completeness.

ROVO offers suggestions and actions to efficiently manage the incident, providing a summary and potential root cause, along with similar historical incidents.

Recommendations include assigning team members and adding affected services to resolve the incident swiftly. If no services are initially added, consider adding them as key elements.

Affected services help define the incident's impact and are critical for understanding its full scope. Consider updating severity and including additional team members who can assist in resolution.

Resources from the teamwork graph, such as related incidents and knowledge-based articles, provide essential context for quick resolution.

When an incident is confirmed as major, activate the major incident toggle to initiate additional actions, such as creating a Slack channel for team collaboration and gathering observability data.

Join the Slack channel, where AI will summarize the incident for new responders, ensuring everyone is informed and ready to collaborate.

Return to the incident record for further examination.

With the major toggle activated, observe the integration of observability data from New Relic, although technical issues may occasionally arise.

Proceed to gather additional context from the Robo Ops agent for comprehensive understanding and analysis.

Request recent critical alerts from the New Relic account, providing valuable insight into the incident's scope and context. Verify the potential root cause with Robo Ops assistance.

Review documentation and analysis supporting the suggested root cause, ensuring confidence in resolving the incident.

Upon resolving the incident, Robo Ops aids by drafting a post-incident report (PIR), streamlining documentation and review.

Post-incident activities can be exhausting, so the draft PIR allows for efficient review and closure.

The draft PIR, requiring human approval, maintains responder engagement in the final review process.

Examine the draft PIR document.

Review the draft PIR, already published to Confluence, powered by Robo automation.

Ensure the PIR contains complete narratives, impacts, and information before final approval and distribution.

The PIR also includes links to essential references and the Slack channel for further context.
