PowerShell - Sum similar entries from multiple CSV files

2014/09/01 | 1 minute read |

One of my script is scheduled to download everyday the proxy logs files from multiple proxies (Approx 1>2GB per file) of the previous day. The second step is to parse each of them and get the top 200 domain names within a specific environment. Finally at the end of the month another script create a report on the monthly internet usage.

I thought this was an interesting exercise even if some tools would probably do a better job ($$$). Also we shouldn’t take those results too seriously since some protocols like Ajax or HTML5 talk a lot to the servers, keep refreshing pages even if you are not actively working on them.

In this post, I will talk about the last part of this process and how I combine all those files to get a real monthly top domains.

Example of daily report

Here is a sample of daily CSV file result after the parsing has done its job:

So at the end of the month I have a bunch of CSV files that were generated during the months and I need to know how many time each domains were accessed.

One-Liner Magic!

To calculate the Sum of each domains for each months I can simply use the following One-Liner:

Get-ChildItem -Path .\*.csv | # Get each CSV files
    ForEach-Object -Process {
        Import-Csv -Path $PSItem.FullName # Import CSV data
    } | 
    Group-Object -Property Name | # Group per Domain Name
    Select-Object -Unique -Property Name, @{
        Label = "Sum";
        Expression = {
            # Sum all the counts for each domain
            ($PSItem.group | Measure-Object -Property Count -sum).Sum
        }
    } |
    Sort-Object -Property Sum -Descending |
    Out-GridView -Title "Top Domains"

Step by Step

Get all the CSV File

```

```

Now we import the data inside each file. I added a Sort-Object so you can see that we have duplicates (since we have the same values coming from different files).

Using Group-Object by Domain Name. This will group also all the different “counts” for each domain

Finally we use Select-Object to show a unique Domain Name and Sum each counts present in the "Group" property from the Group-Object Cmdlet (see above).