Troubleshooting IIS ARR Bad Gateway Timeout Issues

I recently encountered an issue whereby Lync 2013 mobile clients would consistently disconnect when utilising IIS ARR 3.0 on Microsoft Windows Server 2012 R2. I have deployed ARR for Lync Server 2013 on several ocassasions and have always followed guidelines around timeout values for the webservices public facing URL specifically, as this is known to cause issues if the timeout value is below 200 seconds. Intitially in this particular customers case, I was sure the problem was not ARR related as I had performed this process several times previously, however after testing with a public IP address directly on the ARR server in order to rule out TCP timeout issues on the customers firewall appliance, the problem still occurred and I needed a way to look further into what IIS was reporting. The following details the process I went through to identify the issue and gain a resolution.

1. Firstly, we need to take a look at the log file that the Lync Mobile client produces when the issue occurs. Ensure logging is enabled under the clients options and then reproduce the issue, following this if you enter the applications options for a second time there is an additional highlighted option to send the log files to an e-mail address. Once this is performed, open the log file on a workstation and perform a find operation for the word “gateway”, if you receive a match similar to “E_BadGateway (E2-3-35)” you have a timeout issue. In fact, any bad gateway error reporting in the Lync Mobile client logs is timeout related, finding out where the timeout has occurred is the key factor to determine. In my case, I was seeing the following error in my client side log which was consistent on Windows, Android and iOS based devices.

Bad Gateway

 2. After ruling out the customers firewall appliance by placing a public IP address on the ARR server itself, I then knew the issue had to either be the Lync Server Front End, ARR Server or the customers router, however the latter was the most unlikely. Following this I decided to take a look at the IIS log files, the first issue I stumbled across here is that I had not installed the IIS logging role features and as result no log files had been generated. For reference, to obtain the correct log files in order to assist in diagnostic the problem you will need to add the “HTTP Logging” and “Tracing” Web Server role services from the Windows Server 2012 R2 Server manager, below is a screenshot of the options that are required.

Role Services

Once these role services are installed, you will then need to reproduce the issue on a Lync Mobile client in order for a log to be generated, once performed the log file will be viewable under the following file location on the ARR Server, “C:\inetpub\logs\LogFiles\W3SVC1\”. In this folder you will see a log file or files have been generated, by opening the log file you will be presented with a time and date stamped entry of the processes that have occurred on the server. By performing a find on this file and specifying “502”, this will locate your timeout event, in my case I was experiencing a 502.3 error as detailed below.

2014-12-30 11:50:02 GET /ucwa/v1/applications/21223095915/events ack=3&low=5&medium=5&timeout=180&priority=141994047&X-ARR-CACHE-HIT=0&X-ARR-LOG-ID=42ec39b2-cb68-4ad3-8ab8-8ba781a7bcba 443 – 151.228.9.186 Mozilla/5.0+(Windows+Phone+8.1;+ARM;+Trident/7.0;+Touch;+rv:11.0;+IEMobile/11.0;+NOKIA;+Lumia+920)+like+Gecko – 502 3 12002 33124

The end of the error is the part that is most important at this stage, we see “502 3” which means we received a 502.3 error and interestingly we also see that 33.124 seconds elapsed without a response, which caused the timeout to occur, this is shown by the very last line in the above output, the 33124 is represented in milliseconds which converted in seconds is 33.124. This confused me somewhat initially as I had already configured my webservices proxy timeout to be 960 seconds, which should have avoided this timeout issue.

3. To further investigate the issue, we can additionally configure an IIS logging utility entitled Failed Request Tracing Rules, this feature allows us to trigger a report and accompanying log file if a particular exception is matched, in my case I wanted to trigger an exception when a 502.x error was produced. This can be configured be opening the IIS Management Console and selecting the “Server Name” node to display the IIS Home options.

Failed Request Tracing Rules

On double clicking the Failed Request Tracing Rules icon, click Add in the top right hand corner of the screen and a new dialog box will be presented. In this box, click Next on the first presented screen, in the next window enter 502 in the “Status Codes” dialog box area and click Next. In the “Select Trace Providers” window, ensure only “WWW Server” is selected and click Finish.

 4. Once the tracing rule has been created, again reproduce the timout issue on a Lync Mobile client and when complete a new IIS log file will be available in the following location “C:\inetpub\logs\FailedReqLogFiles\W3SVC1” and an accompanying report will have been produced in an XML format, typically entitled “fr000001”. Proceed and open the XML file in Internet Explorer and when reviewing the content we will be able to identify which application routing request triggered the timeout, as detailed below.

Failed Request Tracing Rules Report

As we can see under the URL_Changed value, the timeout was actually being encounterd when the https://meet.domain.com URL was being queried, this was confirmed by matching the GET command that was detailed in the original IIS log file and then matching this to the URL detailed in the Failed Request Tracing Report. On increasing the proxy timeout value for the meet.domain.com server farm in IIS ARR to 960 seconds, the issue was resolved. As to why this problem occurred specifically within this environment and as to why the reverse proxy was seeing a URL for https://meet.domain.com/ucwa/v1/applications, which is a web services directory thus meaning the queried URL is actually invalid, compared to others that I have deployed, I cannot currently say for sure however I even went as far as installing ARR 2.5 on Windows Server 2012 and experienced an identical issue. Hopefully this will assist someone else from a troubleshooting perspective at least.

AudioCodes E-SBC – Removing ;ext= From An INVITE Header

Recently I was performing an integration between Microsoft Lync Server 2013 and an Alcatel OmniPCX phone system at a customer site. The existing PBX already had an element of VoIP via a separate platform called OpenTouch which allowed users to have multiple devices associated with a single extension. For an unknown reason when calling OpenTouch users via Lync Server 2013, which caused the call to be sent to an AudioCodes Virtual E-SBC and then onto the Alcatel PBX via a direct SIP trunk, the users handset would not ring. After working with the customers Alcatel vendor, it transpired the OpenTouch platform did not like INVITE’s that were sent with a from destination of +441234567890;ext=7890 for example, and the ;ext= element was causing the issue. In order to remove this aspect from an INVITE the ;ext= element needed to be stripped from the INVITE header using a Message Manipulation on the AudioCodes E-SBC, as with Sonus devices a regular expression is required in order to remove any parts of an INVITE we do not need. On an AudioCodes devices running version 6.8 of the firmware, the following was performed.

1. When logged into the device and expand VoIP -> SIP Definitions -> Msg Policy & Manipulations and then select Message Manipulations.

2. In the Message Manipulations table, click add and provide an Index number, such as 1 if this is your first manipulation and then proceed to configure the manipulation as per the provided screenshot below.

Message Manipulation

3. To understand how Message Manipulation works, in the condition section this where we look for specific information within the INVITE. In this case we are using a regular expression to place each part of the INVITE header into a variable. For example the (.*) part of the condition places the value into variable $1, this would be the +441234567890 part of the number only, the next part of the regular expression (;ext=) would be placed in variable $2 which would only be the (;ext=) part of the header and so on and so forth. By placing each part of the header into variable we can then control the manipulation by specifying an Action Value, this Action Value constructs the header using only specific parts that we wish to use. For example, to gain a result that removes the ;ext= element, we would use $1 (+441234567890) and then $4 (@) and then $5 (domain.com) to construct a header that displays [email protected].

4. Once this is complete, expand VoIP -> VoIP Network -> IP Group and then edit the IP Group you wish to apply the message manipulation to. When the properties of the manipulation opens, enter the Manipulation Set ID that you entered in step 2 into either the inbound or outbound manipulation set ID dialog boxes depending on the direction in which you need to apply the manipulation, as illustrated below.

SBC Manipulation Set

That’s it, when performing a debug trace on the gateway you should now see the ;ext= element has now been removed from the INVITE header.

Thanks to Siplifi (https://www.siplifi.com) for the assistance.

Lync Server 2013 – Cannot Setup Mirroring Database

I recently implemented a Lync Server 2013 infrastructure with a mirrored Microsoft SQL backend. During the configuration of the mirror, a network interruption occurred which resulted in the creation of the mirror database failing. Once the network issue had been resolved, I attempted to recreate the mirror via the “Install Databases” option in the topology builder to find the process failed with the following error:

“Cannot setup mirroring because there is an error when validating the current database states on primary and mirror instances for the database “rtcxds”. Exception: System.InvalidOperationException: Cannot setup mirror database “rtcxds” because it already exists.

Despite manually attempting to create the mirror via the Lync Server Management Shell, the error persisted. In order resolve the problem, the following was performed.

1. Open the SQL Management Studio on the primary database server and locate the affected database, in my case it was “rtcxds”.

2. Right click the database and click “Detach” and then click OK when the following window is displayed.

3. Open the SQL Management Studio on the mirror database server and locate the affected database.

4. Right click the database and click “Delete”, you won’t be able to detach this database as it is not the primary.

5. Remove the created database files on each SQL server and then open the Lync Server topology builder.

6. In the topology builder right click “Lync Server 2013” and then click “Install Databases”.

7. Follow the install databases wizard and the mirror should now deploy correctly.

Exchange 2007 Unified Messaging & Lync Server 2013 – Key Mapping Issue

I recently deployed a Microsoft Lync Server 2013 infrastructure for a customer running Microsoft Exchange Server 2007 SP3, and while this version of Exchange Server is supported it should be noted there are a few additional amendments that need to made in order to make UM Auto Attendant key mappings to Lync extensions work correctly. Following the typical UM integration through the use of OCSUMUtil.exe and ExchUMUtil.ps1, all Unified Messaging functionality seemed to be working correctly, including dial by extension. It wasn’t until a key mapping was added to an Auto Attendant to transfer a call to a specific Lync extension, did I see an issue. When calling the attendant and pressing one for example, which was directed to extension 319, the Unified Messaging service would produce the following error and the attendant would tell the caller “The call could not be transferred”.

As you can see, there is not a specific amount of detail to go on and researching this particular Event ID suggested this was an error that could pertain to a number of issues. On researching further however, I noted the following from the TechNet article on integrating Lync Server 2013 with Exchange Unified Messaging:

If you are using a version of Exchange that is earlier than Microsoft Exchange Server 2010 SP1, you must enter the fully qualified domain name (FQDN) of the corresponding Exchange Unified Messaging (UM) SIP dial plan in the Lync Server 2013 dial plan Simple name field. If you are using Microsoft Exchange Server 2010 SP1 or latest service pack, this dial plan name matching is not necessary.

In order to resolve the key mapping issue the following was performed.

1. Connect to the Lync Server 2013 control panel and click Voice Routing and then select the Dial Plans tab.

2. Double click the “Global” dial plan to edit it and in the Simple Name dialog box, remove the word Global and replace it with the name of your Exchange Unified Messaging dial plan followed by your internal Active Directory domain name. For example, if my UM Dial Plan name was “DefaultUM” and my internal domain was “company.local”, I would enter DefaultUM.company.local into the Simple Name field.

3. Click OK and then commit the change, you will then need to wait a few moments for the change to take affect before trying the key mapping again. It should also be noted that in the Global dial plan you will need sufficient normalisation rules for the key mapping to work when transferring to an extension. In my case the dial plan now looked like the following:

That’s it, hopefully your Auto Attendant key mapping issues to Lync extensions will now be resolved.