To Do: LinkedIn Export

Priority Project #
7 LinkedIn Export #: 4
6.95

Basic Contact scrape - handling names.

  • “David A. Lewis” should be David Lewis
  •  
Pending
7.01

The Scraper buttons

  • Currently the code tracks the status - basic details scraped, then full scrape. I think we need to consider how we refresh the basic scrape (check # connections count?) to see if a new scrape is required

 

So how should it work

  • First time use:
    • Hide “Import and Email button”, and “Export” and "Email" buttons
    • Hide Connections Scraped info button
    • Scrape button should show: 1) Import Connections.  
      • This should popup the following text box:  “Step 1.  Our servers will now login to Linkedin and pull in the basic details of all your [1,234] connections. …..”
      •  

 

 

I have a system that allows users to scrape their LinkedIn connections and export the results as a VCF or XLS. The result is that their LinkedIn connections are in their contact lists, on their phone etc. Technically it works by initially scraping the basic details of each of the user's connections and then subsequently the server opens each connection and scrapes all the details in full (education, employment, languages etc). The first process takes a few minutes only. The second process can take hours or days depending on the number of connections. To avoid the anti-scraping policies we have to do them in batches with a delay between each batch, hence it can take days.

I need help to do the following: Design buttons that logically walk a new user through this process. The buttons can appear/disappear as appropriate. What should the buttons do? So my thoughts were that the Scrape button should initially say “Scrape” and it will popup with the following message “Step 1. Our servers will now login to LinkedIn and pull in the basic details of all your [1,234] connections. This will take a few minutes. After that process you will be able to see the basic details (name, photo, current position) of each of your connections in the index here. However, the ”full scrape" (resulting in Employment, Education, and other details) will take time (about [3] days), as it is done in batches. Once complete we will email you to let you know (on xxxx@xxx.com) and you can view them all in the index and download VCF for use in Outlook or XLS for more detailed analysis. You can login and check at any time the progress. Would you like us to email you progress reports or simply when complete?"

I was wondering if i should then show a second button “2) Initiate full connection-level scrape” or whether I should just go ahead and initiate the process. I also have a display that tells them progress so far. Do you agree with that approach or how can you make it easier. I also need to build a process to update the download - how should i handle that?

 

- Pending
7.01

Users

  • In the User list, show:
    • An icon if they have verified their linkedIn name and we have the cookie
    • Their number of connections
    • If they have downloaded them
- Pending
7.02

Cookie view

  • Add in the dashboard the list of cookies (was user_password)?   
  • Perhaps show in the user list if the cookies are not added
  • Create a help email for users to add their cookie
Pending
7.02

Linkedin icon

  • Change gitignore or other solution to make the LinkedIn icon show
- Complete
7.03

Scrape

  • When a LinkedIn contact has something unusual in their name (eg the maiden name in brackets, or a same or PhD in brackets) then it fails to allocate the first name and last name successfully.  
    • Suggestion:  Take anything in between brackets and delete (including deleting the brackets)
      • See attachment for examples in my Connections list (6 out 1,222 ‘fail’)
  • Scraping of gender from linkedIn needs to be trimmed for it to be effective - returns “…….She/Her……”
    • Let's test it

Settings

  • The settings for MaxLinkedInContacts and LinkedInScrapeBatch don't seem to be effective.  We should use these fields to reduce the initial scrape to make testing quicker

Popup Message

  • The popup message appears each time your launch Index - “There are 1222 connections. The system pulls data in the batch of 20 per hour. It will take approximately 62 hours or 2 days and 14 hours to completely fetch all the data of 1222 connections. Our system will auto pull data in the interval of 1 hour. You can check status in every one hour. To continue please start the process.”

 

 

- Complete
7.04

Exports

  • For the PartnerFirm export add extra columns to the XLS export
    • Name of PartnerFirm Users who is connected to the LinkedInContact
  • Solve the multiple VCF export 
Pending
7.06

Scheduling the scrapes

  • Schedule the scrapes to run in batches (e.g. 20 at a time, once every 6 hours - i.e 80 a day -  to minimise the chance of LinkedIn security blocking the IP
    • I recognise that for  someone like me with 1.5k contacts, the process will take ~20 days, but that is ok. 
      • Perhaps email progress report (and a file with a subset of the users) after each run  with the updates so far  (once we have fully tested this)
  • I have created a linked-inexport@atts-systems.com email address (password Descartes99) for the email to be sent from
Pending
Loading…
Loading the web debug toolbar…
Attempt #