Wednesday, March 1, 2023

Mapping data flows in Azure Data Factory

 

Overview:

Data flows are one of the features inside the Azure Data Factory which allows data engineers to develop data transformation logic in a graphical approach without writing code. The resulting data flows can then be executed as activities within Azure Data Factory pipelines that use scaled-out Spark clusters. Your data flows will run on your own execution cluster for scaled-out data processing. ADF internally handles all the code translation, spark optimization and execution of transformation. Data flow activities can be operationalized via the existing Data Factory scheduling, control, flow, and monitoring capabilities.

There are two types of Data flows:

  • Mapping Data Flow
  • Wrangling Data Flow

We start our discussion with Mapping Data flows

Mapping Data Flow –

  • Mapping data flows are visually designed data transformations in Azure Data Factory.
  • When there is a situation like you need to perform transformations using two or more datasets then you use a Mapping data flow.
  • You can perform several transformations such as Filter, JOIN, Aggregate, Union, Lookup, Sort, etc using mapping data flows.
  • Mapping data flows can be executed within ADF pipelines using data flow activities.
  • Azure Data Factory handles the code transformation and execution of Mapping Data Flow behind the scenes.
  • Mapping Data Flows activity can be created individually or within an Azure Data Factory pipeline.

Steps to create a Mapping Data Flow:

  1. Open the Azure Data Factory using the Azure portal, then click on Author & Monitor

Z 1 (2)

Click on the Author button then click on Data flows option. By clicking on three dots select New data flow option.

  1. Select Mapping Data Flow. Click

Z 2 (2)

 

Steps to build transformation logic in the data flow canvas:

Once you create your Data Flow, you’ll be automatically sent to the data flow canvas.

Note: The assumption is that you are already aware of the basic building blocks of a data factory- like creating linked services, Pipelines, etc. Click Here

  1. In the data flow canvas, add a source by clicking on the Add Source
    1. Name your source. Click on New to create a new source dataset.

    Z 4 (2)

     

    1. Choose Azure Blob Storage. Click

    Z 6 (2)

     

    1. Choose DelimitedText. Click Continue.

    Z 7 (2)

     

    1. Name your dataset. Let’s say empDataDataset. In the linked service dropdown choose Linked service if you have created before or you can click +New button to create new linked service.

     

    1. Once you’re back at the dataset creation window choose your File path. As the CSV file has headers, check First row as header. Select From connection/store to import the header schema directly from the file in storage. Click OK when done.

    Z 9 (2)

     

    1. To add a transformation, click on the + icon on the data source which is next to your source node on the data flow canvas as specified in the screenshot below.Z 10

    You can see there are various transformations available in the data flow.

Monday, February 13, 2023

Overview of AI techniques


 

Artificial Intelligence (AI) is the way of making computing hardware and software think intelligently, in similarity to the manner humans use natural intelligence.

What is Artificial Intelligence?

The range of tasks that computers are capable of performing has increased rapidly since they were first created. The power of computer systems has been enhanced by humans in terms of their many working domains, growing speed, and decreasing size over time.

Artificial intelligence is a subfield of computer science that aims to build machines or computers that are as intelligent as people.

According to “John McCarthy," known as the father of artificial intelligence, AI is “the science and engineering of making intelligent machines, especially intelligent computer programs."

Artificial intelligence is a technique for teaching a computer, a robot that is controlled by a computer, or a piece of software to think intelligently, much as intelligent people do.

It is possible to create intelligent software and systems by first studying how the human brain works, as well as how people learn, make decisions and collaborate when attempting to solve a problem.

If given enough information, machines are capable of doing human-like actions. Consequently, knowledge engineering is crucial to artificial intelligence. In order to perform knowledge engineering, the relationship between objects and properties must be established. The methods used in artificial intelligence are listed below.

Therefore, an AI technique is a type of method applied to this set of available knowledge that organizes and uses that set as efficiently as possible:

  • Easy modifiable to correct errors
  • Useful in many ways even it is incomplete or to some degree inaccurate
  • Understood and clear by its provider
  • Having a clear purpose


Top 4 AI techniques

1.   Machine Learning (ML)

Applications that learn from experience and increase their prediction or decision-making accuracy over time are the focus of machine learning.

A subset of machine learning called "Deep Learning" uses artificial neural networks for predictive analysis. Machine learning uses a variety of algorithms, including reinforcement learning, supervised learning, and unsupervised learning. When learning unsupervised, the algorithm does not need classified data to make decisions on its own. In supervised learning, a function that includes a set of an input object and the intended output is inferred from the training data. Machines utilise reinforcement learning to determine the best possibility that needs to be considered and to take the appropriate actions to improve the reward.


2.  Natural Language Processing (NLP)

Building machines that comprehend and react to text or voice data and answer with text or speech of their own in a manner akin to that of humans is the goal of natural language processing.

Computers are programmed to process natural languages in the context of interactions with human language. Natural Language Processing, which extracts meaning from human languages through machine learning, is a proven technique. In NLP, a machine records the audio of a person speaking. The dialogue is then transformed from audio to text, after which the text is processed to turn the data into audio. The machine then responds to people using audio.

Applications of NLP can be found in Interactive Voice Response (IVR) systems used in call centres, in language translators like Google Translate, and in word processors that check the correctness of syntax in text, like Microsoft Word.

However, due to the rules required in communicating using natural language, which are challenging for computers to understand, natural language processing is challenging due to the nature of human languages. In order to translate unstructured data from human languages into a format that the computer can understand, NLP employs algorithms to recognise and abstract the rules of natural languages. Additionally, NLP is used in content improvement tools such apps for paraphrasing, which enhance the readability of difficult text.

 

3.    Automation and Robotics

Expert systems or applications that are able to perform tasks given by a human.

They have sensors to pick up information from the outside world, such as temperature, movement, sound, heat, pressure, and light, which is processed so they can act intelligently and learn from their mistakes.

The goal of automation is to have machines perform boring, repetitive jobs, increasing productivity and delivering more effective, efficient, and affordable results. In order to automate processes, many businesses use machine learning, neural networks, and graphs. By leveraging CAPTCHA technology, such automation can prevent fraud problems during online financial transactions. Robotic process automation is designed to carry out high volume, repetitive jobs while being flexible enough to adapt to changing conditions.

 

4.    Machine Vision (MV)

Is the technology and procedures used to deliver imaging-based automatic inspection and analysis for applications like automatic inspection, process control, and robot guiding, typically in the industrial setting

Machines are capable of collecting and analysing visual data. In this case, cameras are utilised to record visual data, which is then processed using digital signal processing once the image is converted from analogue to digital. The data that is produced is then fed into a computer. Sensitivity—the ability of the machine to recognise weak impulses—and resolution—the extent to which it can distinguish between objects—are two essential components of machine vision. Machine vision is used in a variety of applications, including pattern recognition, medical picture analysis, and signature detection.

Thursday, October 17, 2013

Introduction to Cloud Computing Windows® Azure™



  •  Wikipedia defines cloud computing as: 

       “Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like the electricity grid.”

       The idea behind cloud computing is to access software applications and/or data from resources available through the Internet with a simple browser.

       In cloud computing, you pay for the resources (applications and data) you use, as you go.

  •  Key reasons for shifting to cloud computing include cost savings, improved scalability and reliability, and availability of applications/data anywhere.
Why is it important
  •  Companies incur a capital expenditure when spending money on a fixed asset like a data center or hardware.

Types of Cloud Computing Services


Windows Azure
  •   Windows Azure is a cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. 
  •   Can integrate your public cloud applications with your existing IT environment.
  •   Windows Azure delivers a 99.95% monthly SLA and enables you to build and run highly available applications without focusing on the infrastructure.
  •   It provides automatic OS and service patching, built in network load balancing and resiliency to hardware failure. 
  •   You can use any language, framework, or tool to build applications. Features and services are exposed using open REST protocols. The Windows Azure client libraries are available for multiple programming languages, and are released under an open source license and hosted on GitHub.
Windows Azure Architecture
  •  The hyper visor (Microsoft optimized Hyper-V ) runs on each server.  It manages and controls the virtual servers running on the physical server.
  •   Each virtual partition in the Microsoft cloud runs a modified version of Windows Server 2008 R2 Enterprise Edition.
Windows Azure cloud computing platform (PaaS).

 
 

 Fabric Controller functions as the kernel of the Azure platform. It provisions, stores, delivers, monitors and commands the virtual machines (VMs) and physical servers that make up Azure.



AppFabric (formerly .NET Services) provides many enterprise-level services to include access control, caching and distributed messaging via a service bus.
  •  Microsoft provides the hardware and software to host your applications, more formerly known as services, and data.
       Web site
       Computational service
  •       You can access the data via an HTTP API from inside or outside the data centers.
  •       You pay for the computational processing and storage in Windows Azure on a consumption model – that is in a pay as you go and for what you use manner.
Windows Azure Storage

Tables, blobs Storage, Queues & SQL Azure
  •   You access Windows Azure Storage (any of the three types) through a REST API over HTTP.
  •  Tables, blobs and queues storage types, allowing data to be stored in non-relational database.
  •   structured tables similar to what you   would find in a relational database, but   without indexes and relationships

  •   Binary Large Object (Blob) storage houses large binary data such as images, videos, music, documents
  •   SQL Azure to support relational database needs
  •   SQL Azure uses a special version of Microsoft SQL Server as its backend.
  • There are differences between SQL Azure and SQL Server 2008 there are number of limitations and unsupported features found in SQL Azure.
  •   You access SQL Azure data with all the standard tools (Management Studio) and APIs that work with SQL server. (ADO.NET and ODBC APIs)
  •   SQL Azure supports typical ACID transactions for work within the same database.
  •   SQL Azure is more expensive than Windows Azure Storage, and in particular table storage.

SQL Azure Architecture
  •   While SQL Azure is based on SQL Server, the architecture is such that you are not connecting directly to SQL Azure as you might connect to SQL Server.

Mapping data flows in Azure Data Factory

  Overview: Data flows are one of the features inside the Azure Data Factory which allows data engineers to develop data transformation logi...