Lesson learn Scala programming
I got some Scala to write in a recent spark related project. Specially, the Scala UDF (aka. user defined function) to be used in spark application.
I got a feel that Try, Success, Failure is not very useful until being applied to the For Comprehensions for automatically unpacking. Refer to the following code example (inspired by StackOverFlow Post)
import java.sql.Timestamp
import org.apache.spark.sql.functions.udf
import scala.util.{Try, Success, Failure}
// combine time_date and time_time for a timestamp in UTC.
val time_to_ts: ((String, String) => Option[Timestamp]) = (time_date, time_time) => {
val time_time_a = time_time.split(':')
val ts = for (
y <- Try(time_date.take(4).toInt); // toInt exception
m <- Try(time_date.take(6).takeRight(2).toInt); // toInt exception
d <- Try(time_date.take(8).takeRight(2).toInt); // toInt exception
tsymd <- Try(Timestamp.valueOf(s"$y-$m-$d 0:0:0.0").getTime()); // parsing
// exception
h <- Try(time_time_a(0)); // index exception
M <- Try(time_time_a(1)); // ditto
s <- Try(time_time_a(2)); // ditto
ms <- Try(time_time_a(3)); // ditto
tshmsms <- Try(Timestamp.valueOf(s"1970-1-1 $h:$M:$s.$ms").getTime()); // parsing
// exception
ts <- Try(new Timestamp(tsymd - 3600 * 1000 * 8 + tshmsms)) // construct
// exception
) yield ts
ts match {
case Success(timestamp) => Some(timestamp)
case Failure(_) => None
}
}
val getTs = udf(time_to_ts)
Why not use built-in functions?
Since the spark built-in datatime, timestamp family of functions when used to convert string to internal timestamp. Will lose the ****miliseconds**** field, which is critical to our use case.
评论
发表评论