This is the third post of a series that covers everything about importing all files in a folder into Excel. Click here to see the series.
The previous scenario covered how to import all Excel files in a folder, however, for that example we assumed that all files had the right format and were ready to be appended one on top of each other.
But, what if we need to clean the files before importing them?
For example, look at the file below:
If we are interested in getting the information below row 8, we must remove the first 8 rows from each file before importing them.
Let’s do it!
Download the files from here to follow along.
The process to do this is as follows:
Step 1: Import a single file
Go to Data → New Query → From File → From Workbook
Browse for the file and click on Import
Then select the sheet or table to import and click Edit.
Step 2: Perform the cleaning process
After the file is imported, we see that Power Query got rid of the first two rows already. Therefore, we only need to remove the first 6.
In this case, we will only remove the top 6 rows, but you can perform any other data transformation steps you deem necessary.
To remove the top 6 rows, go to Home → Remove Rows → Remove Top Rows
Then enter the number of rows to remove and press OK.
Then we need to promote the headers, go to Home → Use First Row as Header
Finally, change the format of the Date column to date (Right-click the header of the Date Column, go to Change Type → Date
That’s all.
The next step is to create a function with the query we just created.
Step 3: Create a function
To create a function, go to View → Advanced Editor
In the Advanced Editor, you will notice that the first line contains the function File.Contents and a fixed path and filename.
To create the function, we must replace the section in the rectangle above with the parameter name, see below.
The name of the parameter goes at the top surrounded by parenthesis and followed by =>
Once you create the function click on Done.
Then go to the query name box and rename the function with a new name, e.g., fxGetData.
Finally, go to File → Close & Load
You should be able to see the function in the Workbook Queries pane along with other queries you have in the workbook.
Step 4: Apply the function to all files in the folder
Go to Data → New Query → From File → From Folder
Click on ‘Browse’ and browse for the folder that contain the files, then click OK.
Once you click OK, press Edit on the next window.
We’re only interested in the Content column, therefore, right-click on the header of the Content column and select Remove Other Columns.
The go to Add Custom Column → Invoke Custom Function
Select the function (fxGetData), enter a name for the new column (FileContent), select the columns to pass to the function (Content), and click OK.
If you click on the tables, you’ll be able to see the contents of each one.
Step 5: Expand the contents and load to Excel
Then click on the Expand button in the FileContent column and click OK in the Expand dialog box.
In the next dialog box, make sure to un check “Use original column name as prefix”.
Right-click on the header of the column and select Remove
Change the type of the Date column to date and load the data to the workbook.
That’s All!!!
Click here if you want to see the other posts in the series.
Want to continue learning about automating your data preparation processes?? Subscribe to the blog.
© Master Data Analysis All Rights Reserved 2024