Deep into the Web: a Brief Guide for Recruiters

Have you subscribed to our feed yet? Get straight Recruiter Talk HERE: (We give away cool things)

Back in the year 2000 I worked on a software product that tested potential drugs based on algorithms applied to organic chemistry  data. Why was the product important for customers at drug development companies? First, the program could explore many more potential drug formulas than any older methods. Second, a clinical test of a potential drug is very time consuming and expensive. Make a decision based on software, rejecting a drug, and the savings and productivity are huge. Only drugs that are really promising can then go to real life tests.

I see an analogy there. I think this is really what online sourcing is about:

1)      Finding more potential candidates.

2)      Spending less time and money screening unqualified people and more time talking to the qualified people.

rotary-cell-phone  Do online sourcing right, and you are more often on the phone with the right people!

Just a few years ago online sourcers mostly used Boolean search on the web and on Job Boards. These days the set of tools is expanding and it’s hard to keep up with the new technologies. Some sourcing tools vendors promise to eliminate any need to learn complex search syntax. Sounds great! However, the abundance of tools and sites seems only to complicate the sourcing process for many. To select the right sourcing methods, we need to have some understanding of the ideas and technologies behind the tools. Only with this understanding the above two advantages 1) and 2) can be achieved. So I’d like to do a brief walk through some modern terms and technologies, to try and help you navigate. (While I will provide examples, my goal is to explain concepts, not to give an overview of existing tools.)

Here are the terms I will talk about:

  • Semantic Search
  • Deep Web Search
  • Real Time Search

Semantic Search

Semantic is meaning. Semantic search engines can be either general or specialized.

General semantic search engines have a really hard task: making sense of very broad and very large sets of data. Many of the general semantic engines are subject for research papers and conference talks; few general semantic engines are of practical use to sourcers for now. Take a look at this image, for example; this is what Cluuz made of my search for a specific job title. This looks fun but is hardly relevant to efficient sourcing. (Cluuz seems useful for people search though; check it out.

Before I move on to talk about specialized tools, I would like to recommend this cool general semantic search site:

kngine-com

It does an excellent job answering questions about terminology, concepts, industry news, and more.  Tools that concentrate on specific tasks, like the task of finding profiles and resumes, can do much better than general tools in providing meaningful results. They are often called vertical search engines. Pipl.com is an example of a vertical search engine.

Vastness of data is an obstacle for semantic search engines’ performance, even for those engines that are specialized. Here are two options to implement semantic search:

1)      Create a tool that queries a database. Searching within a database, even a large one, is technically much easier than searching the web. The recently added Monster power search can find people who are not job hoppers, or come from a top school. Not bad!

2)      Pick a set of data from the web based on an initial search, with possible false positives, and then parse, sort and filter this set. There are excellent tools for sourcers that work this way, such as Broadlook Diver and eGrabber ResumeGrabber. This methodology is also included in some CRM and ATS systems along with other functionality.

google

There are Google search tricks that can provide elements of this second (“staged”) approach even without using a specialized tool.  The tricks would force Google to display critical parts of the found documents, to enable filtering without extra clicks.

As an example, use a substring “email * * com” or “contact * * com” added to your typical search for resumes on Google, and see (and potentially collect) email addresses right in Google’s results pages. Here is an example of filtering: if the contact email address in a resume ends in .au, and you are searching for candidates in the UK, this resume is a false positive.

Existing semantic search tools for recruiters may also be based on one or more of the following; this list can be extended:

1)      Searching for proximity of keywords in a document such as a resume; “managed” near “people” probably means that we had found a manager.

2)      Recognizing synonyms and abbreviations and using them to search. Engineer is like developer; PwC is the same as PricewaterhouseCoopers.

3)      Making assumptions about keyword weights (i.e. importance) by parsing a job description and/or allowing the user to add weights to keywords. The title and any synonyms for the title are more important than the skills listed in a job description; must-haves are more important than nice-to-haves.

4) Working based on keyword clouds. This means knowing which words often appear together and adding those words to a search, to help target the best results. As an example, known certifications in an industry may be added as extra keywords in a search for candidates.

While I can suggest Google search tricks for the above methods 1), 2), there’s no way to tell any major web search engines about weights of keywords (3). As for the method 4), it’s clear that creating a customized keyword cloud sitting in your semantic search tool would be to a big advantage for the user. I have seen a demo of Pure Discovery that works with clouds of words.

Great as they are, the semantic tools need to understand Boolean logic to be useful. The Boolean AND, OR and NOT do not really present a difficulty for anybody I have ever met. (What’s difficult is not the Boolean logic but the advanced operators). The Monster’s new sleek tool is not capable of Boolean, and it’s a problem for me. I can’t, for example, exclude candidates who work for employers that I am not allowed to solicit candidates from, such as my own clients.

As of January 2010 I have not seen one single semantic tool for recruiters that I’d recommend as the tool of choice. Also, no matter what tools we pick, to remain competitive, we all need to retain sites like Google, Yahoo, and Bing in our toolboxes for a while. (Include Yahoo as longs as Yahoo’s own search is still around.)

Slide14_fs Search engines crawl the web by going from a web page to all of the links in the page, saving their content, and proceeding to do the same with every page they find. (This is a bit simplified.) The pages found this way are called surface web. The pages that cannot be found this way are called, by definition, the Deep Web

I saw a tweet from somebody who wanted to delay sourcing on the Deep Web until she masters the surface web. However, you are already accessing the Deep Web if you have at least one account at a social network or a Job Board. Google has no password to Monster or CareerBuilder, and cannot search them, but you can.

 

Most Social Media sites have both a Deep Web and a surface web component to them. As an example, a LinkedIn profile will show different data for logged in LinkedIn users and for logged out people (and for naturally “logged out” search engines). Good reasons to learn what constitutes the Deep Web components of membership-based sites are to figure out: 1) complementary ways to source and 2) ways to cross-reference data from different sources. Interestingly, Facebook has recently made a decision to change the parts of it that are deep and surface; this makes a big difference for sourcers.

Other pages that belong to the Deep Web, along with membership-based, password-protected sites, are: pages that are generated on the fly, based on user’s queries to databases; pages that are not linked to other pages; pages in odd formats that cannot be read by search engines; and pages that explicitly ask search engines not to index them.

Real Time

If you are on Twitter, the easiest way to experience real time search is to search on its home page. It accepts advanced Boolean syntax, by the way. A large, and growing, number of sites search across Social Media and pick tweets, blogs, posts, news, chats, etc. in real time.

Did you know that Google now searches in real time, and so does Bing? Try looking for the “Updates” option on Google and you will see dynamically updated results. The search engines have recently made deals with Twitter, Facebook, and a few other major networks to be able to do that.

Real time search on Google is new technology, and I feel it’s not surface web (not crawled) and not quite deep web (since most of these results can be found by crawling, though much later). I’d say real time search has become a third kind of search, so we now have:

  1. Surface (pages found by crawling)
  2. Deep Web (pages that cannot be found by crawling) 
  3. Real Time (pages that can be found by crawling but are found faster by getting them directly  from target sites)

Real time search is critical for sourcers who want to be ahead of the competition. If you haven’t used Google alerts before, it’s time to try them.

Selecting the Right Tools

When you choose the sourcing tools and sites, it’s important to figure out where the data is coming from. And no, not “everything is picked from LinkedIn now” (as I read in a forum post recently). We can use the Semantic, Deep Web, and Real Time notions to compare tools. Selecting only one tool may deprive you of useful sources.

Here are a few examples. Zoominfo crawls the web; its data is organized in part surface (visible to search engines) and part deep web (visible with a membership). Jigsaw has data entered by its members; some of its data is visible to all and some is hidden. All company data in Jigsaw is on the surface web, and no membership is required to see it. Pipl.com gets results dynamically from the surface as well as from the deep web. Compare Jigsaw, Zoominfo, and Pipl data against LinkedIn’s and you get validated data which will let you call more target people. As for semantic sourcing tools, give them a good trial run before you buy and check what the technology does and does not.

Happy sourcing!

Leave your thoughts below to add yourself into the drawing to win 3 FREE months of referYes Sourcer. It’s painless and only takes a comment.

About the author

irli  Irina is an Executive and Technical Recruiter, an Expert Sourcer and a Web/Social Media Researcher and Trainer. For the past five years she has been a Partner with Brain Gain Recruiting, placing senior full-time employees in software development, IT, ERP, strategy consulting, and finances. She has an MS in Mathematics and a strong technical background. Irina is the Winner of the SourceCon Challenge in 2009. Read about Irina’s training webinars and DVDs on her blog Boolean Strings + Social Media. Irina runs several active recruiter online communities, including the Boolean Strings Network. Here is Irina’s LinkedIn Profile. Follow her on Twitter at @braingain.

Noel Cocca on LinkedinNoel Cocca on TwitterNoel Cocca on Youtube
Noel Cocca
CEO/Founder RecruitingDaily and avid skier, coach and avid father of two trying to keep up with my altruistic wife. Producing at the sweet spot talent acquisition to create great content for the living breathing human beings in recruiting and hiring. I try to ease the biggest to smallest problems from start-ups to enterprise. Founder of RecruitingDaily and our merry band of rabble-rousers.


CEO/Founder RecruitingDaily and avid skier, coach and avid father of two trying to keep up with my altruistic wife. Producing at the sweet spot talent acquisition to create great content for the living breathing human beings in recruiting and hiring. I try to ease the biggest to smallest problems from start-ups to enterprise. Founder of RecruitingDaily and our merry band of rabble-rousers.


Just add your e-mail!