Posts

Showing posts from June, 2022

Databricks | No Need To Skip Rows Before Header Row while reading a CSV File

Image
  Man! The past couple of weeks has been really tough. Hardcore development on Azure Data Factory, and Azure Databricks as we are up against a tight deadline (again  :-) ). Loads of different scenarios and loads of new learnings. Sharing one below, keep reading. We are receiving a source file (let's call it Test.csv) which has a blank row before the header row  1 2 "colname1", "colname2" 3 "value1","value2" we are using  spark.read.format to load this into a data frame.  Looking at the file contents, one would assume that you need to somehow skip the first blank row.  So I began researching it. Found that spark.read.format does nt provide any such property.  After spending couple of hours with no major break through, I thought of testing the code as it is  val rawdataframe=  spark.read.format("csv").option("header","true").option("inferSchema","true").option("delimiter", s",&qu