Making The LDAP Connection

Just like any database, the connection is crutial. No Ticky, No Laundry.
This page will discuss connections in an Active Directory (AD) environment, but much of it may cross over to other directory services models.
There are a few things to consider about the connection and one of those being replication.
Replication
This is how all of the servers in the directory stay in sync. Data is replicated from one directory services server called domain controllers (DCs). Though I am not going to go into it, some attributes are replicated faster than others, some are replicated on random schedules, some only to specific DCs. For instance, if you are checking the lastLoginTime attribute, it initially replicates immediately between the DC the user hit first and the PDC of the domain. That timestamp is then gradually replicated out to the other DCs. That being said, the only way to know for sure when the last login time was is to either go directly to the PDC or go to every DC, gather the dates and pick the highest date. See how that can become more of a challenge to get the correct data? Though it may sound simply to go directly to the PDC on the domain, you do not always know which DC is the PDC if you are going to multiple domains. There is also the problem of making a call to write data to a DC. If your call writes to one DC and then you read immediately afterwards and get another DC that change may not have replicated to the DC you are reading from... this would mean that you need to write and read to the same DC on a write-read.
Access
Access plays a big part in the connection. You must have authorization to connect (Bind) to the domain and make a call/query. Normally this is by a service account. Some domains allow any authenticated user to make and LDAP query... some do not and you will need to be aware of this or you may not be able to connect.

The call
Once you confirm you can connect (bind) to the domain, you can bind to it and make an LDAP call by using a query, just like SQL. In its simplest form, a standard LDAP connection call looks something like: (LDAP:// + DC + / + Base OU) LDAP://mydomain.com/DC=mydomain,DC=com This is where my previous page on objects comes into play. If you know that user objects are under an Accounts OU, you want to go directly to that OU to query the data. It is also why I say it is good to have a top down overview of what the domain looks like so your queries can be efficiently written by starting in the correct part of the tree for your search. LDAP://mydomain.com/OU=Accounts,DC=mydomain,DC=com (This would be somewhat similar to a connection string in SQL) Now I have highlighted the domain part of the ldap connection string for a reason. Using the domain is not always the optimal connection and you will need to check with your AD team to find out what is. Many times the domain is behind a load balancer.
Consider this:
Your domain is behind DNS. This means that if you are sitting in New York and want to query for a user, you may hit a New York domain controller... or you may not. If DNS uses a round robin approach, your first call may be to a DC in New York, your next call may be to a DC in Hawaii and you start to have latency issues. See how that can become a problem?
If your Domain is behind a load balancer, the load balancer will take note of where you are calling from and try to keep all of your calls within a specific region selecting the fastest routes for your calls. The load balancer will have a different DNS name you can call like mydomaindc.com. This will force the calls to be intercepted by the load balancer first and get you to where you need to go efficiently. This can however be problamatic if the load balancer bounces you around to choice DCs. Normally there is something called a sticky bit on the load balancer that will keep your initial connection and only switch if there is a problem.

Leveraging AD Sites
Similar to the load balancer solution, my preferred method is to use the Directory Services library and let AD pick a Domain Controller. This way your call will be kept within an AD site. It will also fix the write/read problem because you are letting AD handle the search for the best domain controller (DC) for the IP Address of the machine you are coming from. AD Sites are set up in AD by picking the IP addresses of domain controllers that are optimal and creating boundries. I am not going to go into this, but you should be aware of it. So the way this method works is you you first query AD using GetDC() in DirectoryServices for the domain you are hitting. AD will in turn pick the optimal DC for your connection. Once you have the best DC you simply insert that into your LDAP connection string. This can be set up manually by the domain administrators or AD itself can be allowed to pick the optimal paths. The call would equate to: LDAP://BestDC/OUPath.
Now when you make your write call, it is written to that DC and if you instantly read back the data, you are hitting the same DC that you wrote to and replication is no longer a part of the picture. I will post up the source code in c# for making these calls soon. It is fairly simple, but understanding what the code is doing is key to getting what you expect.

The second half of the LDAP path
Now that you have the first side of the path LDAP://DOMAIN you need to figure out the second half. There is a string called rootDSE that you can sometimes use but it is not guaranteed and will always start you out at the root of the tree. LDAP://domain/rootDSE. This is not advisable though because you will be searching from the root down and some domain administrators will block this ability. It also means that you will be searching across OUs that may not even contain the object type you are looking for. Total waiste of resources and timely. If the domain is not organized by object classes however, it may be unavoidable. This is at the hands of the domain administrators and you have no control over it... This is why you should always include the object class in your queries!

Once you have established the bind, made a call, the world is your oyster!

Limitations
Now on a small scale this is a simple task, on a large scale, not so much. If your user base is a few hundred employees, ldap calls are not reallly problematic. If your directory contains hundreds of thousands of users you can literally exhaust a search because there are just too many objects to return. In most cases, domains themselves will put a cap on the amount of data your query can return and it is usually in the 1000 to 1500 mark. This is where paging comes in and again is quite similar to a database.
You set a start limit and a stop limit and page through them until you get what you need. Efficient queries can get around this some. Setting limits on the least amount of characters required to make a search. If you have 500k users and allow a search on the letter "a", chances are your search will time out or not return the data you are expecting. Now if your domain allows users with the name of "a" you will probably need to use paging. In most scenarios the domain administrators will set a minimal size of what a username can be and this can get you around using paging, but only if everyone sticks to the rules.
In a web page where you have lower timeout limits, this can certainly become problamatic and people will push the boundries wherever they can.