tag:status.nextup.ai,2005:/historyNextup.ai Status - Incident History2024-03-27T22:50:03-04:00Nextup.aitag:status.nextup.ai,2005:Incident/178464702023-07-13T17:29:31-04:002023-07-13T17:29:31-04:00Degraded API Performance in HD+<p><small>Jul <var data-var='date'>13</var>, <var data-var='time'>17:29</var> EDT</small><br><strong>Resolved</strong> - There have been no issues logged in the past four hours, this issue is now resolved.</p><p><small>Jul <var data-var='date'>13</var>, <var data-var='time'>14:15</var> EDT</small><br><strong>Monitoring</strong> - We've deployed an update and are actively monitoring application performance to ensure that the issue is resolved.</p><p><small>Jul <var data-var='date'>13</var>, <var data-var='time'>13:49</var> EDT</small><br><strong>Identified</strong> - We believe we have identified the root cause and are working towards resolution. We are optimistic that this will be resolved today.</p><p><small>Jul <var data-var='date'>13</var>, <var data-var='time'>13:48</var> EDT</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Jul <var data-var='date'>13</var>, <var data-var='time'>13:48</var> EDT</small><br><strong>Investigating</strong> - We have been experiencing degraded API response times in the most recent update of our platform. We started receiving alerts a few hours after the most recent update and our team has been investigating this issue since receiving those alerts. We believe we have identified the root cause and are currently testing a resolution.</p>tag:status.nextup.ai,2005:Incident/175965532023-06-16T16:08:30-04:002023-06-16T16:38:21-04:00Degraded Performance and Inability to Access Portal<p><small>Jun <var data-var='date'>16</var>, <var data-var='time'>16:08</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jun <var data-var='date'>16</var>, <var data-var='time'>15:51</var> EDT</small><br><strong>Monitoring</strong> - We've identified the issue and have implemented a fix. We are working to confirm that the fix has taken effect and that all services are functioning normally.</p><p><small>Jun <var data-var='date'>16</var>, <var data-var='time'>14:17</var> EDT</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Jun <var data-var='date'>16</var>, <var data-var='time'>14:17</var> EDT</small><br><strong>Investigating</strong> - We are aware of an issue with slowness in the application and an inability for users to login to the Nextup administrative portal. We are currently investigating the issues and will update this page once we've isolated the cause.</p>tag:status.nextup.ai,2005:Incident/175634632023-06-13T17:02:12-04:002023-06-13T17:02:12-04:00Elevated AWS Error Rates<p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>17:02</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved. For more details, please see https://health.aws.amazon.com/health/status which outlines the incident within AWS.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>15:37</var> EDT</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Jun <var data-var='date'>13</var>, <var data-var='time'>15:37</var> EDT</small><br><strong>Monitoring</strong> - AWS is experiencing elevated error rates in several services, including Lambda. This is affecting the Nextup services because Lambda is an entrypoint to the Nextup services.</p>tag:status.nextup.ai,2005:Incident/119589922022-10-13T17:17:23-04:002022-10-13T17:17:23-04:00Duplicate Messages<p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>17:17</var> EDT</small><br><strong>Resolved</strong> - Slacks service is starting to recover and no longer seeing issues with our service.</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>15:03</var> EDT</small><br><strong>Identified</strong> - These issues are confirmed to be caused by errors with the Slack APIs - please refer to Slacks status page for the latest updates - https://status.slack.com/</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>10:17</var> EDT</small><br><strong>Investigating</strong> - We are currently being affected by an issue on the Slack API which is causing some message duplication in threads. <br /><br />Check the latest Slack status at https://status.slack.com/</p>tag:status.nextup.ai,2005:Incident/95093202022-03-09T16:00:00-05:002022-03-10T10:03:40-05:00Degraded Response Times<p><small>Mar <var data-var='date'> 9</var>, <var data-var='time'>16:00</var> EST</small><br><strong>Resolved</strong> - Some customers experienced degraded response times and/or an inability to auto-create issues from Helpdesk+. This issue lasted from approximately 3:40PM Eastern US time until 4:00PM Eastern US time. After reviewing the symptoms, we found that AWS experienced issues with SQS that affected our services in the US EAST 1 / Virginia region and resulted in elevated error rates.</p>tag:status.nextup.ai,2005:Incident/77559182021-08-13T14:52:36-04:002021-08-13T14:52:36-04:00Slack connectivity issues<p><small>Aug <var data-var='date'>13</var>, <var data-var='time'>14:52</var> EDT</small><br><strong>Resolved</strong> - Slack is reporting this incident is now resolved. More details available here: https://status.slack.com/2021-08-13</p><p><small>Aug <var data-var='date'>13</var>, <var data-var='time'>09:56</var> EDT</small><br><strong>Monitoring</strong> - Slack is identifying an issue in their systems that is affecting the ability to add new bots. Further details can be see here: https://status.slack.com/2021-08/a6a3b58db77f9b9d</p>tag:status.nextup.ai,2005:Incident/76441422021-07-31T06:39:05-04:002021-07-31T06:39:05-04:00German Data Center Timeouts<p><small>Jul <var data-var='date'>31</var>, <var data-var='time'>06:39</var> EDT</small><br><strong>Resolved</strong> - There have been no additional errors in the system in the last 24 hours and we are resolving this issue. We will continue to look for ways to better monitor for events like these in the future.</p><p><small>Jul <var data-var='date'>30</var>, <var data-var='time'>11:44</var> EDT</small><br><strong>Monitoring</strong> - One of our internal services went unresponsive and started experiencing timeouts that our monitoring systems did not detect and automatically restart. This caused our system to experience an outage when communicating with this service. <br /><br />Services were restarted and the issue was resolved. We are still investigating the root cause of this internal service outage.</p>tag:status.nextup.ai,2005:Incident/67093462020-06-26T09:00:00-04:002021-04-29T15:04:22-04:00Invalid Error Notifications<p><small>Jun <var data-var='date'>26</var>, <var data-var='time'>09:00</var> EDT</small><br><strong>Resolved</strong> - A customer reported on June 25, 2020 that they had received an issue update from a user that is not an employee.<br /><br />We investigated the issue and found a defect in the application that caused error messages to be combined across teams in an invalid way.<br /><br />What happened<br />- A customer created a comment and transitioned an issue in Jira and the comment contained a special character that cannot be displayed in Slack.<br />- The special character was detected and replaced with an error message indicating the comment could not be displayed.<br />- The error was sent to the customer that contained the transition and the error.<br />- A subsequent customer then added a comment that also contained a special character.<br />- The system did not properly clear the prior error from cache and combined the updates and sent a message to the second team which indicated a change from an invalid user.<br />- We located the place in the logic that was not properly clearing the prior messages and resolved the issue.<br /><br />The hotfix was deployed on June 26, 2020.</p>tag:status.nextup.ai,2005:Incident/67093562020-05-15T11:30:00-04:002021-04-29T15:04:22-04:00Notifications with incorrect users<p><small>May <var data-var='date'>15</var>, <var data-var='time'>11:30</var> EDT</small><br><strong>Resolved</strong> - Thursday @ 11:30 EST we were doing a support call with a client, during this call we were working out issues with our notifications for this customer.<br /><br />We then placed some test notifications in what we believed to be a single teams queue. These messages got added to several teams queues on accident.<br /><br />On Friday May 15, 2020 at 11:26 the notification service sent updates to approximately 10 bots that included valid Jira issue information but invalid updates attributed to users unknown to the customer.<br /><br />These updates may have occurred 1 or more times during the effected period which has been identified as May 15, 2020.<br /><br />There is no indication of a breach of our systems and customer Jira instances are not believed to be incorrectly updated. The issue appears to be due to user error and not due a security incident.<br /><br />What we believe occurred<br />A Nextup admin was testing a client issue and incorrectly placed test messages in multiple teams queues.<br />An update was later received from Jira for a valid change.<br />The application logic did not correctly pass the team element between services.<br />The orphaned update was incorrectly attached to an additional valid notification.<br />Additional validations have been added to the system along with additional logging to capture future potential occurrences along with additional debugging procedures to prevent future issues.<br /><br />The system processes millions of notifications daily and additional teams have not reported any issues after the known incident. We are continuing to monitor the application for future occurrences.</p>