Troubleshooting IIS ARR Bad Gateway Timeout Issues

I recently encountered an issue whereby Lync 2013 mobile clients would consistently disconnect when utilising IIS ARR 3.0 on Microsoft Windows Server 2012 R2. I have deployed ARR for Lync Server 2013 on several ocassasions and have always followed guidelines around timeout values for the webservices public facing URL specifically, as this is known to cause issues if the timeout value is below 200 seconds. Intitially in this particular customers case, I was sure the problem was not ARR related as I had performed this process several times previously, however after testing with a public IP address directly on the ARR server in order to rule out TCP timeout issues on the customers firewall appliance, the problem still occurred and I needed a way to look further into what IIS was reporting. The following details the process I went through to identify the issue and gain a resolution.

1. Firstly, we need to take a look at the log file that the Lync Mobile client produces when the issue occurs. Ensure logging is enabled under the clients options and then reproduce the issue, following this if you enter the applications options for a second time there is an additional highlighted option to send the log files to an e-mail address. Once this is performed, open the log file on a workstation and perform a find operation for the word “gateway”, if you receive a match similar to “E_BadGateway (E2-3-35)” you have a timeout issue. In fact, any bad gateway error reporting in the Lync Mobile client logs is timeout related, finding out where the timeout has occurred is the key factor to determine. In my case, I was seeing the following error in my client side log which was consistent on Windows, Android and iOS based devices.

Bad Gateway 1024x82 Troubleshooting IIS ARR Bad Gateway Timeout Issues

 2. After ruling out the customers firewall appliance by placing a public IP address on the ARR server itself, I then knew the issue had to either be the Lync Server Front End, ARR Server or the customers router, however the latter was the most unlikely. Following this I decided to take a look at the IIS log files, the first issue I stumbled across here is that I had not installed the IIS logging role features and as result no log files had been generated. For reference, to obtain the correct log files in order to assist in diagnostic the problem you will need to add the “HTTP Logging” and “Tracing” Web Server role services from the Windows Server 2012 R2 Server manager, below is a screenshot of the options that are required.

Role Services Troubleshooting IIS ARR Bad Gateway Timeout Issues

Once these role services are installed, you will then need to reproduce the issue on a Lync Mobile client in order for a log to be generated, once performed the log file will be viewable under the following file location on the ARR Server, “C:\inetpub\logs\LogFiles\W3SVC1\”. In this folder you will see a log file or files have been generated, by opening the log file you will be presented with a time and date stamped entry of the processes that have occurred on the server. By performing a find on this file and specifying “502”, this will locate your timeout event, in my case I was experiencing a 502.3 error as detailed below.

2014-12-30 11:50:02 GET /ucwa/v1/applications/21223095915/events ack=3&low=5&medium=5&timeout=180&priority=141994047&X-ARR-CACHE-HIT=0&X-ARR-LOG-ID=42ec39b2-cb68-4ad3-8ab8-8ba781a7bcba 443 – 151.228.9.186 Mozilla/5.0+(Windows+Phone+8.1;+ARM;+Trident/7.0;+Touch;+rv:11.0;+IEMobile/11.0;+NOKIA;+Lumia+920)+like+Gecko - 502 3 12002 33124

The end of the error is the part that is most important at this stage, we see “502 3″ which means we received a 502.3 error and interestingly we also see that 33.124 seconds elapsed without a response, which caused the timeout to occur, this is shown by the very last line in the above output, the 33124 is represented in milliseconds which converted in seconds is 33.124. This confused me somewhat initially as I had already configured my webservices proxy timeout to be 960 seconds, which should have avoided this timeout issue.

3. To further investigate the issue, we can additionally configure an IIS logging utility entitled Failed Request Tracing Rules, this feature allows us to trigger a report and accompanying log file if a particular exception is matched, in my case I wanted to trigger an exception when a 502.x error was produced. This can be configured be opening the IIS Management Console and selecting the “Server Name” node to display the IIS Home options.

Failed Request Tracing Rules Troubleshooting IIS ARR Bad Gateway Timeout Issues

On double clicking the Failed Request Tracing Rules icon, click Add in the top right hand corner of the screen and a new dialog box will be presented. In this box, click Next on the first presented screen, in the next window enter 502 in the “Status Codes” dialog box area and click Next. In the “Select Trace Providers” window, ensure only “WWW Server” is selected and click Finish.

 4. Once the tracing rule has been created, again reproduce the timout issue on a Lync Mobile client and when complete a new IIS log file will be available in the following location “C:\inetpub\logs\FailedReqLogFiles\W3SVC1″ and an accompanying report will have been produced in an XML format, typically entitled “fr000001″. Proceed and open the XML file in Internet Explorer and when reviewing the content we will be able to identify which application routing request triggered the timeout, as detailed below.

Failed Request Tracing Rules Report 1024x371 Troubleshooting IIS ARR Bad Gateway Timeout Issues

As we can see under the URL_Changed value, the timeout was actually being encounterd when the https://meet.domain.com URL was being queried, this was confirmed by matching the GET command that was detailed in the original IIS log file and then matching this to the URL detailed in the Failed Request Tracing Report. On increasing the proxy timeout value for the meet.domain.com server farm in IIS ARR to 960 seconds, the issue was resolved. As to why this problem occurred specifically within this environment and as to why the reverse proxy was seeing a URL for https://meet.domain.com/ucwa/v1/applications, which is a web services directory thus meaning the queried URL is actually invalid, compared to others that I have deployed, I cannot currently say for sure however I even went as far as installing ARR 2.5 on Windows Server 2012 and experienced an identical issue. Hopefully this will assist someone else from a troubleshooting perspective at least.

AudioCodes E-SBC – Removing ;ext= From An INVITE Header

Recently I was performing an integration between Microsoft Lync Server 2013 and an Alcatel OmniPCX phone system at a customer site. The existing PBX already had an element of VoIP via a separate platform called OpenTouch which allowed users to have multiple devices associated with a single extension. For an unknown reason when calling OpenTouch users via Lync Server 2013, which caused the call to be sent to an AudioCodes Virtual E-SBC and then onto the Alcatel PBX via a direct SIP trunk, the users handset would not ring. After working with the customers Alcatel vendor, it transpired the OpenTouch platform did not like INVITE’s that were sent with a from destination of +441234567890;ext=7890 for example, and the ;ext= element was causing the issue. In order to remove this aspect from an INVITE the ;ext= element needed to be stripped from the INVITE header using a Message Manipulation on the AudioCodes E-SBC, as with Sonus devices a regular expression is required in order to remove any parts of an INVITE we do not need. On an AudioCodes devices running version 6.8 of the firmware, the following was performed.

1. When logged into the device and expand VoIP -> SIP Definitions -> Msg Policy & Manipulations and then select Message Manipulations.

2. In the Message Manipulations table, click add and provide an Index number, such as 1 if this is your first manipulation and then proceed to configure the manipulation as per the provided screenshot below.

Message Manipulation 1024x96 AudioCodes E SBC   Removing ;ext= From An INVITE Header

3. To understand how Message Manipulation works, in the condition section this where we look for specific information within the INVITE. In this case we are using a regular expression to place each part of the INVITE header into a variable. For example the (.*) part of the condition places the value into variable $1, this would be the +441234567890 part of the number only, the next part of the regular expression (;ext=) would be placed in variable $2 which would only be the (;ext=) part of the header and so on and so forth. By placing each part of the header into variable we can then control the manipulation by specifying an Action Value, this Action Value constructs the header using only specific parts that we wish to use. For example, to gain a result that removes the ;ext= element, we would use $1 (+441234567890) and then $4 (@) and then $5 (domain.com) to construct a header that displays +441234567890@domain.com.

4. Once this is complete, expand VoIP -> VoIP Network -> IP Group and then edit the IP Group you wish to apply the message manipulation to. When the properties of the manipulation opens, enter the Manipulation Set ID that you entered in step 2 into either the inbound or outbound manipulation set ID dialog boxes depending on the direction in which you need to apply the manipulation, as illustrated below.

SBC Manipulation Set AudioCodes E SBC   Removing ;ext= From An INVITE Header

That’s it, when performing a debug trace on the gateway you should now see the ;ext= element has now been removed from the INVITE header.

Microsoft Lync Server 2013 – Consolidated Edge Audio Issue

I was assisting a customer with the renewal of a Consolidated Edge external SSL certificate recently and upon applying the newly issued certificate to the server, audio calls with externally situated users or federated partners via the Consolidated Edge failed. Upon answering an audio call via the Lync Client, it immediately reported “Called Ended” and third party IP handsets would become stuck in a “Connecting” state once the call was answered by the remote party. While the latter error would have suggested perhaps an audio/video port issue at a firewall level, I was on the phone to the customer via a federated call when the certificate was assigned to the Consolidated Edge services, which in turn immediately caused the call to drop and as such I knew a firewall issue was not the root cause of this problem. On inspection of the Consolidated Edge servers event log, everything appeared to fine and all Consolidated Edge specific services were started, additionally all other functionality such as remote and federated instant messages were working as expected. However, upon a restart of the Consolidated Edge services the following event was logged:

The Access Edge Server failed to import a shared session key due to invalid signature.

In the past 1 minutes, the server rejected 1 shared session keys presented from the network due to an invalid signature. This suggests an incorrect certificate configuration. A large number of failures could indicate spoofed session key data sent by an attacker.

Cause: This is most likely to be a configuration problem in a server array.

Resolution: Ensure that all servers in an Access Edge Server Array have the same certificate configured for the external IP address on the Edge Interfaces tab. Inspect the serial number of each certificate. If the problem persists, use the Administrator Log to help identify the source of these messages.

Upon reading the error this immediately prompted me to inspect the certificate via the certificates MMC snap-in and upon reviewing the new SSL certificate everything appeared to be correct, the certificate chain was reporting as valid as was the private key. In this customers case a GoDaddy SSL certificate had been procured and while the certificate chain stated the implemented certificate was trusted, it transpired an intermediate GoDaddy certificate was missing. On adding the missing intermediate certificate into the Local Computer certificate store, which was also included in the downloaded certificates .zip file, the audio calling issue was resolved. In summary, always check intermediate SSL certificates for your chosen provider, specifically if you are not utilising one of the bigger SSL vendors such as VeriSign or Digicert. Hope this helps!

Lync Server 2013 – Cannot Find Any Suitable Disks For Database Files

I was adding a new front end pool to a customers Microsoft Lync Server 2013 deployment recently, when I came across an issue when attempting to implement pool pairing between two Standard Edition servers. On completion of the backup service MSI being installed, I received the error “Command execution failed: Cannot find any suitable disks for database files. You must manually specify database paths.”, as illustrated below.

Database Error Lync Server 2013   Cannot Find Any Suitable Disks For Database Files

It became apparent quite quickly this issue was related to the available storage on the virtual machines C:\ drive, while there was sufficient space to perform the front end installation the addition of an extra component was a step too far in terms of required storage. In this case, the customer had provisioned a 50 GB local hard disk drive, which is under the recommended 72 GB required for the deployment. On expanding the local hard disk drive to 80 GB and running the Lync Server Deployment Wizard an additional time, the error was no longer apparent. This was a good example of ensuring any virtual or physical machines utilised for Microsoft Lync Server 2013 roles need to conform to the minimum hardware requirements in order to avoid potentially time consuming issues. That’s it!

Lync Server 2013 – Cannot Setup Mirroring Database

I recently implemented a Lync Server 2013 infrastructure with a mirrored Microsoft SQL backend. During the configuration of the mirror, a network interruption occurred which resulted in the creation of the mirror database failing. Once the network issue had been resolved, I attempted to recreate the mirror via the “Install Databases” option in the topology builder to find the process failed with the following error:

“Cannot setup mirroring because there is an error when validating the current database states on primary and mirror instances for the database “rtcxds”. Exception: System.InvalidOperationException: Cannot setup mirror database “rtcxds” because it already exists.

Despite manually attempting to create the mirror via the Lync Server Management Shell, the error persisted. In order resolve the problem, the following was performed.

1. Open the SQL Management Studio on the primary database server and locate the affected database, in my case it was “rtcxds”.

2. Right click the database and click “Detach” and then click OK when the following window is displayed.

3. Open the SQL Management Studio on the mirror database server and locate the affected database.

4. Right click the database and click “Delete”, you won’t be able to detach this database as it is not the primary.

5. Remove the created database files on each SQL server and then open the Lync Server topology builder.

6. In the topology builder right click “Lync Server 2013″ and then click “Install Databases”.

7. Follow the install databases wizard and the mirror should now deploy correctly.

Exchange 2007 Unified Messaging & Lync Server 2013 – Key Mapping Issue

I recently deployed a Microsoft Lync Server 2013 infrastructure for a customer running Microsoft Exchange Server 2007 SP3, and while this version of Exchange Server is supported it should be noted there are a few additional amendments that need to made in order to make UM Auto Attendant key mappings to Lync extensions work correctly. Following the typical UM integration through the use of OCSUMUtil.exe and ExchUMUtil.ps1, all Unified Messaging functionality seemed to be working correctly, including dial by extension. It wasn’t until a key mapping was added to an Auto Attendant to transfer a call to a specific Lync extension, did I see an issue. When calling the attendant and pressing one for example, which was directed to extension 319, the Unified Messaging service would produce the following error and the attendant would tell the caller “The call could not be transferred”.

UM Error Exchange 2007 Unified Messaging & Lync Server 2013   Key Mapping Issue

As you can see, there is not a specific amount of detail to go on and researching this particular Event ID suggested this was an error that could pertain to a number of issues. On researching further however, I noted the following from the TechNet article on integrating Lync Server 2013 with Exchange Unified Messaging:

If you are using a version of Exchange that is earlier than Microsoft Exchange Server 2010 SP1, you must enter the fully qualified domain name (FQDN) of the corresponding Exchange Unified Messaging (UM) SIP dial plan in the Lync Server 2013 dial plan Simple name field. If you are using Microsoft Exchange Server 2010 SP1 or latest service pack, this dial plan name matching is not necessary.

In order to resolve the key mapping issue the following was performed.

1. Connect to the Lync Server 2013 control panel and click Voice Routing and then select the Dial Plans tab.

2. Double click the “Global” dial plan to edit it and in the Simple Name dialog box, remove the word Global and replace it with the name of your Exchange Unified Messaging dial plan followed by your internal Active Directory domain name. For example, if my UM Dial Plan name was “DefaultUM” and my internal domain was “company.local”, I would enter DefaultUM.company.local into the Simple Name field.

3. Click OK and then commit the change, you will then need to wait a few moments for the change to take affect before trying the key mapping again. It should also be noted that in the Global dial plan you will need sufficient normalisation rules for the key mapping to work when transferring to an extension. In my case the dial plan now looked like the following:

Global Dial Plan 300x107 Exchange 2007 Unified Messaging & Lync Server 2013   Key Mapping Issue

That’s it, hopefully your Auto Attendant key mapping issues to Lync extensions will now be resolved.

Lync Server 2010/2013 Response Group Holiday Sets

I recently performed a Lync Server 2010 deployment for an organisation with a branch office in Aberdeen, Scotland. As their bank holidays vary slightly in comparison to England and Wales, I created a Response Group Holiday Set for Scotland. The original script layout credit goes to UnifiedMe, which you can find here.

1. Connect to your Microsoft Lync Server 2010/2013 front end server.

2. Open the Microsoft Lync Server Management Shell and paste the following contents in its entirety. Prior to doing so, enter the FQDN of your front end pool in the “ApplicationServer” section before running the command.

 $a = New-CsRgsHoliday -StartDate “06/05/2013 12:00 AM” -EndDate “07/05/2013 12:00 AM” -Name “2013 Early May”
$b = New-CsRgsHoliday -StartDate “27/05/2013 12:00 AM” -EndDate “28/05/2013 12:00 AM” -Name “2013 Spring”
$c = New-CsRgsHoliday -StartDate “05/08/2013 12:00 AM” -EndDate “06/08/2013 12:00 AM” -Name “2013 Summer”
$d = New-CsRgsHoliday -StartDate “02/12/2013 12:00 AM” -EndDate “03/12/2013 12:00 AM” -Name “2013 St Andrew’s Day”
$e = New-CsRgsHoliday -StartDate “26/12/2013 12:00 AM” -EndDate “27/12/2013 12:00 AM” -Name “2013 Boxing Day”
$f = New-CsRgsHoliday -StartDate “25/12/2013 12:00 AM” -EndDate “26/12/2013 12:00 AM” -Name “2013 Christmas Day”
$g = New-CsRgsHoliday -StartDate “01/01/2014 12:00 AM” -EndDate “02/01/2014 12:00 AM” -Name “2014 New Years Day”
$h = New-CsRgsHoliday -StartDate “02/01/2014 12:00 AM” -EndDate “03/01/2014 12:00 AM” -Name “2014 2nd January”
$i = New-CsRgsHoliday -StartDate “18/04/2014 12:00 AM” -EndDate “19/04/2014 12:00 AM” -Name “2014 Good Friday”
$j = New-CsRgsHoliday -StartDate “05/05/2014 12:00 AM” -EndDate “06/05/2014 12:00 AM” -Name “2014 Early May”
$k = New-CsRgsHoliday -StartDate “26/05/2014 12:00 AM” -EndDate “27/05/2014 12:00 AM” -Name “2014 Spring”
$l = New-CsRgsHoliday -StartDate “04/08/2014 12:00 AM” -EndDate “05/08/2014 12:00 AM” -Name “2014 Summer”
$m = New-CsRgsHoliday -StartDate “01/12/2014 12:00 AM” -EndDate “02/12/2014 12:00 AM” -Name “2014 St Andrew’s Day”
$n = New-CsRgsHoliday -StartDate “26/12/2014 12:00 AM” -EndDate “27/12/2014 12:00 AM” -Name “2014 Boxing Day”
$o = New-CsRgsHoliday -StartDate “25/12/2014 12:00 AM” -EndDate “26/12/2014 12:00 AM” -Name “2014 Christmas Day”
$p = New-CsRgsHoliday -StartDate “01/01/2015 12:00 AM” -EndDate “02/01/2015 12:00 AM” -Name “2015 New Years Day”
$q = New-CsRgsHoliday -StartDate “02/01/2015 12:00 AM” -EndDate “03/01/2015 12:00 AM” -Name “2015 2nd January”
$r = New-CsRgsHoliday -StartDate “03/04/2015 12:00 AM” -EndDate “04/04/2015 12:00 AM” -Name “2015 Good Friday”
$s = New-CsRgsHoliday -StartDate “04/05/2015 12:00 AM” -EndDate “05/05/2015 12:00 AM” -Name “2015 Early May”
$t = New-CsRgsHoliday -StartDate “25/05/2015 12:00 AM” -EndDate “26/05/2015 12:00 AM” -Name “2015 Spring”
$u = New-CsRgsHoliday -StartDate “03/08/2015 12:00 AM” -EndDate “04/08/2015 12:00 AM” -Name “2015 Summer”
$v = New-CsRgsHoliday -StartDate “30/11/2015 12:00 AM” -EndDate “01/12/2015 12:00 AM” -Name “2015 St Andrew’s Day”
$w = New-CsRgsHoliday -StartDate “28/12/2015 12:00 AM” -EndDate “29/12/2015 12:00 AM” -Name “2015 Boxing Day”
$x = New-CsRgsHoliday -StartDate “25/12/2015 12:00 AM” -EndDate “26/12/2015 12:00 AM” -Name “2015 Christmas Day”
New-CsRgsHolidaySet -Parent ApplicationServer:servername.domain.local -Name “Scotland Bank Holidays” -HolidayList ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u,$v,$w,$x)

3. In the Lync Control Panel click “Response Groups” and then click to create or edit a workflow. Once the web site loads select the hunt or interactive response group that you need to apply the holiday set to and click edit. In the workflow editior under the “Specify Your Holidays” section you should now see the Response Group Holiday Set name displayed. Click to check the holiday set and the configure your preferred call routing method for the days contained within the created set and click Save at the bottom of the workflow once you have finished.

That’s it, the holiday set will now be active.

Lync 2013 EWS With Forefront TMG 2010 Issues

I recently performed a Microsoft Lync Sever 2013 migration and following this process I noted that Lync 2013 clients connecting from external networks were continually prompted for Outlook authentication in order to retrieve data from Exchange Web Services (EWS). After investigating the issue, it appears this occurs when utilising forms based authentication for the /Autodiscover/* and /EWS/* virtual directories when utilising an SSL web listener in Forefront TMG 2010. In order to resolve the issue a separate global IP address was obtained and assigned to a web listener that does not perform any pre-authentication and simply passes the authentication request directly to Exchange Server 2010. The reason a separate global IP address and web listener was utilised, is that should you be using a single web listener for all Exchange services you will need to disable forms based authentication for OWA, Outlook Anywhere and Exchange ActiveSync, in most environment this would not be a desirable approach, however using a separate listener purely for autodiscover and EWS satisfies most security requirements. The following steps were performed in Forefront TMG 2010 to resolve the issue.

1. In Forefront TMG 2010 right click “Firewall Policy” -> “New” -> “Exchange Web Client Access Publishing Rule”.

2. When the wizard invokes enter a name for the publishing rule such as “Exchange Web Services” and click Next.

3. On the select services page click the drop down item and select the appropriate version of Exchange Server for your environment, and then check to select “Outlook Anywhere (RPC\HTTP(s))” and then also select “Publish additional folders on the Exchange Server for Outlook 2007 clients” and then click Next.

4. On the Publishing Type page ensure that “Publish a single website or load balancer” is checked and click Next, on the following Server Connection Security page select “Use SSL to connect to the published web server for server farm” and click Next.

5. On the Internal Publishing Details page in the Internal site name dialogue box enter the fully qualified domain name of your Exchange Client Access Server and then click Next.

6. On the Public Names page enter the the fully qualified domain name used for external autodiscover, for example autodiscover.domain.com and then click Next.

7. On the Select Web Listener page click “New” and then enter and appropriate name for the listener and click next, on the following page select “Require SSL secured connections with client” and then click Next.

8. On the Web Listener IP address page, click External and then “Select IP Addresses” and continue to select the new global IP address that is to be used for Exchange Web Services and then click Next.

9. On the Listener SSL Certificate page click “Select Certificate” and then choose your third party SSL certificate for Exchange Services, this certificate must include the subject alternative name of autodiscover.domain.com, once selected click Next.

10. On the Authentication Settings page click the drop down item and select “No Authentication” and thenclick next and then next again past the Single Sign On page and the click Finish.

11. Back in the main publishing rule wizard, ensure the newly created listener is selected and then click Next. On the Authentication Delegation page click the drop down item and select “No delegation but client may authenticate directly” and then click Next, on the following User Sets page click Next and then Finish to create the publishing rule.

12. Once the rule has been created right click it and select properties and select the paths tab and remove the /OAB/* and /rpc/* entries and click OK. Following this change click Apply on the Firewall Policy page and wait for the TMG configuration store to update accordingly.

13. The Exchange Web Services rule is now created, and should look like the following as detailed below, please click to enlarge.

Web Publishing Rule Properties 1024x408 Lync 2013 EWS With Forefront TMG 2010 Issues

14. If the publishing of the rule has applied correctly, when connecting with your Lync 2013 client externally you should now longer be continually prompted for Outlook credentials and additionally under the configuration information section of the client, which can be accessed by holding down the control (Ctrl) key and then left clicking the Lync 2013 task tray icon and selecting “Configuration Information”, the EWS status should now say “EWS Status OK”.

That’s it, hopefully your EWS external access now works as intended.