Coursera – Data to Insights – Data Deduplicating

I finished the 1st course and onto the 2nd course, Data to Insights.  Haven’t really learned anything that new that I haven’t done before in some other program like IBM SPSS Modeler.  The course is interesting especially when the data set is using the google ecommerce data.  Seeing how to apply it to an ecommerce data set is what I’ve found the most interesting.

Introduced to the term ‘Data deduplication’.  After watching more videos in the course I did do this but never knew about the term.  Easiest way of getting rid of duplicated is using the GROUP BY in the SQL.  Did not know that there was a term for this.

Some new SQL functions I’ve been introduced to are:

  • WITH
  • RANK
  • PARTITION BY
  • _TABLE_SUFFIX

I can take advantage of WITH and decrease the complexity of certain queries I have.  Looking forward to introducing to a co worker who writes reports where the SQL query is over 200 lines.

The one concept that I found new was the data deduplication.  I haven’t really had a use case for it since the data sources I work with luckily haven’t had that much of duplicate repeating data.  Learning about data deduplication should come in handy in the future when the situation arises.

WITH product_query AS (
  SELECT 
  DISTINCT 
  v2ProductName,
  productSKU
  FROM `data-to-insights.ecommerce.all_sessions_raw` 
  WHERE v2ProductName IS NOT NULL 
)

SELECT k.* FROM (

  # aggregate the products into an array and 
  # only take 1 result
  SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k 
  FROM product_query x 
  GROUP BY productSKU # this is the field we want deduplicated
);
Advertisements
Posted in Coding | Tagged , , | Leave a comment

coursera -Exploring ​and ​Preparing ​your ​Data with BigQuery

Signed up for coursera today.  What led me to sign up with them is that the completion of specialized tracks

  1. Certificates
  2. Course is by ‘Google Cloud’

.Decided to start with Exploring ​and ​Preparing ​your ​Data with BigQuery.  This 4 week course is geared toward data analysts and business analysts.  I see it as a gateway to ML to get the foundation before taking the leap.

Posted in BigQuery, Coding, Google | Tagged , | Leave a comment

What are those command options? explainshell.com

If you ever wondered what those options to a command are like

rsync -avh folder1 /path/to/remote/dir

Command: rsync

Options: avh

explainshell.com is the place to go have those option deciphered.

explainshell

Posted in Coding | Leave a comment

Way better technical problem site: Hackerrank.com

Screw the last post and interviewbits

hackerrank way better.  The site and challenges feels more legit with much more explanation of question and expected input and output.  For the 2nd question totally missed the ability to run your code first before submitting. After using interviewbits, I do miss the stopwatch of how long it takes to solve it. In addition, kinda torn as to whether you should be able to test your code or not.  Maybe give ability to check once and then decrement points.   Time and # of attempts needed to check answer  would be a pretty big differentiator if I was a recruiter.

Add me on hackerrank: fangstar

Decided to do tests in JS.  Haven’t done C/C++ or Java in awhile.  it’d be a good place to brush up.

Someone in the FCC meetup always pointed me to https://www.codewars.com/.  I’ll check that out too

 

 

Posted in Coding | Leave a comment

Technical Interview example website: InterviewBit

Looking to brush up on technical interview questions.  Ran across this site, InterviewBit.  Only in the time complexity section, didn’t like the learning videos at all so skipped to the basic primer questions.  It’s OK just to get some practice I guess. Site isn’t great but any kind of practice helps.  What I do like about the problems is it’s got a counter and gives you an expectation of how long you should be taking.

If you know of a better site please share

Posted in Coding | Tagged , | Leave a comment

DigitalOcean Ubuntu 14 -> 16

Finally getting around to upgrading my DigitalOcean Ubutu LAMP droplet from Ubuntu 14 -> 16.

Steps:

  1. BACKUP db (I forgot this step!!). Ideally copy db dump to local
  2. Make a snapshot of droplet
  3. sudo apt-get update
  4. sudo apt-get upgrade
  5. sudo apt-get install update-manager-core
  6. sudo do-release-upgrade

 

For DigitalOcean it says not to worry about the SSH issue so just type ‘y’ through the rest of the prompts.

I ran into a bunch of strange ‘error’ messages and just ignored them and seemed like everything came out OK.  Hope it does for you too

Reference:

https://www.digitalocean.com/community/tutorials/how-to-upgrade-to-ubuntu-16-04-lts

Posted in Coding | Leave a comment

Configuration Management tool: Ansible

Been learning and playing with Ansible this week as a CM tool.

What do we use it for?

After you spin up an EC2 instance you can use a Ansible to add users/groups, ssh keys, install packages all from a Ansible playbook (script). It’s similar to chef / puppet.

If you have to provision multiple instances this can be a time saver. Also say a new member has joined. How can add this new member to all your instances without doing it manually. Ansible solves this issue because you can create a playbook that adds the user to all hosts easily.

I bricked an instance by corrupting a /etc/sudoers file. That was bad but learned there is a way to validate any changes to that file.

Better practice is to not edit the /etc/sudoers file but to create new files at /etc/sudoers.d/ at least on Ubuntu

Posted in Coding | Leave a comment

Redmine as a Help Desk Solution

I’ve gotten decent at understanding a lot of features of Redmine.  I have been tasked with customizing and configuring Redmine to an internal IT Help Desk solution.  If you install a theme, a bunch of plugins, and change the layout it can be a usable solution.  The best part is internal users can finally use email to generate tickets.

If you need a Help Desk solution that involves external users, I’d recommend the RedmineUp Help Desk plugin.

The feedback process for the Help Desk implementation is slow going so I looked into Redmine + Docker.  I’ve never done a Docker project before so it was pretty fun.  Took a little while but figured out how to replicate the project to docker.

Posted in Web | Tagged , , | Leave a comment

Lesson Learned: O365 cloud mail nameserver change

Was a little careless today with helping someone transfer their GoDaddy domain and O365 mail to a new server (transferring their nameserver).  When I looked at the GoDaddy domain DNS log, I saw all these entries that looked related to O365.  I thought when I move the domain nameserver that those DNS entries relating to O365 would stay so I didn’t even save the info.  Boy was I wrong, after I changed the nameserver I saw all the entries disappear.

Took me a little while to figure out how to setup the DNS entries needed.  Not sure why but it took me awhile to find the instructions on GoDaddy.  But eventually, I got it going but felt pretty bad that I probably prevented emails from being received.

Definitely, a lesson learned.  Need to be more careful in the future.

 

Posted in DevOps, troubleshooting | Tagged , , | Leave a comment

CodeIgniter base project

If you are starting a CodeIgniter project, I’d recommend using

https://github.com/kenjis/codeigniter-composer-installer

as your base.  Composer makes things so much easier.  Then for User management and authentication use Ion_Auth (GitHub)

If you need LDAP, Adldap2 is really easy to use.

From the kenji github page, you can find a lot of other packages.  The other package I’ve used a lot of is the REST Server.  However, if all you need to do is a REST Server I’d recommend using NodeJS.

 

Posted in Coding, php | Tagged , , | Leave a comment