Embedded Self Transcribed Image Text Must Unclear Getting Ridiculous Going Unsubscribe Ser Q29685991
***HERE IS THE EMBEDDED and SELF- TRANSCRIBED IMAGE TEXT, SO ITMUST NOT BE THAT UNCLEAR! THIS IS GETTING RIDICULOUS and I am goingto unsubscribe from the service, if the transcription service canread it , why can’t you?
Show transcribed image text70-511
: Statistical Propramming ProgianmingAssignrien 2-k-MeansClustering You are to creale a prograrn usirng Python hat does thefullowing: Asks the user for a filename which contains the pointdata which is to be clustered (see Data File Format section fordetails). 1. 2. Asks the user for the narme af the output file. 3Asks the user for the number of dusters. This is the parameterkthat will be used for k means. 4. Read the Input file and storesthe points Into a list 5. Applies the k-means alporithm to find thecluster for each point. G. Displays the polints that cach dustercontains after cach itcration of the algorithm . Writes the finalcluster assignments to the output file. Clustering is a process ofidentifying groupinsi.e. clusters within the data. For example, thefipure below shows three clusters of two dimensional data points(Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY,PANDAS, .) NO IMPORT STATEMENTS. The name of your source code fileshould he kMeans. py. All your code shauld be within a single file.Your code should follow good coding practices, including good uscof whitespacc and usc of hoth inline and block comments You need touse meaningful identifier names that conform to standard namineconventions. At the top of cach file, you need to put In a blockcomment with the following Information: your name, date, coursename, semester, and assignment name. The output of your programshould exactly match the sample program output given at the end. 1.2. 3. 4. 5. xxX Data File Format Let N be the number of points andPi to he the value of point i. The input file shauld be of thefollowing format: P1 P2 Clustering has many applications, includinginferring population structures from genetic data, recognizingcommunities within social networks, or segmenting of customers formarket research. PN One of the most popular algorithms forperforming clustering is the k-means method. The algorithm dependson the natian of distance hetween two points. For points with onlyone dimension (just single values), we can define the distancebetween two points p and q as Example: 1.2 2.1 4.56 2.113 What toTurn In You will turn in the single kMeans.py file using BlackBoardThe k-means algorithm will work by placing points into clusters andcomputing their centroids, which is defined as the average of thedata points in the cluster. Specifically, the alporithm works asfollows: HINTS 1. Pick k, the number of clusters. 2. Initializedusters by picking one point [centroid) per cluster. For thisassigriment, you can pick the firstk Make use of listcomprehensions for reading lines from a file and then convertingthe strings into a list of floats points as initial centroids foreach corresponding dluster 3. For cach point, place it in thecluster whose current centrold it is nearest. . After all pointsare assigned, update the locations of centroids of the k clusters5. Reassign all points to their closest centroid. This sometimesmoves points between clusters 6. Repeat 4,5 untill convergence,Convergence occurs when points don’t move between clusters and .Usepwdl to check the directory where you should place your input file.. Use a dict data structures for storing centrolds and clusters.The centrolds dict will be a mapping from cluster number tocentroids. The clusters dict will be a mapping from cluster numberto a list of points in the cluster centroids stabilize
the data set is- prog2-input-data.txt
1.84.51.12.19.87.611.323.20.56.5
Sample Output below:
70-511: Statistical Propramming ProgianmingAssignrien 2-k-Means Clustering You are to creale a prograrn usirng Python hat does the fullowing: Asks the user for a filename which contains the point data which is to be clustered (see Data File Format section for details). 1. 2. Asks the user for the narme af the output file. 3 Asks the user for the number of dusters. This is the parameter kthat will be used for k means. 4. Read the Input file and stores the points Into a list 5. Applies the k-means alporithm to find the cluster for each point. G. Displays the polints that cach duster contains after cach itcration of the algorithm . Writes the final cluster assignments to the output file. Clustering is a process of identifying groupinsi.e. clusters within the data. For example, the fipure below shows three clusters of two dimensional data points (Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, .) NO IMPORT STATEMENTS. The name of your source code file should he kMeans. py. All your code shauld be within a single file. Your code should follow good coding practices, including good usc of whitespacc and usc of hoth inline and block comments You need to use meaningful identifier names that conform to standard namine conventions. At the top of cach file, you need to put In a block comment with the following Information: your name, date, course name, semester, and assignment name. The output of your program should exactly match the sample program output given at the end. 1. 2. 3. 4. 5. xxX Data File Format Let N be the number of points and Pi to he the value of point i. The input file shauld be of the following format: P1 P2 Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research. PN One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the natian of distance hetween two points. For points with only one dimension (just single values), we can define the distance between two points p and q as Example: 1.2 2.1 4.56 2.113 What to Turn In You will turn in the single kMeans.py file using BlackBoard The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the alporithm works as follows: HINTS 1. Pick k, the number of clusters. 2. Initialize dusters by picking one point [centroid) per cluster. For this assigriment, you can pick the firstk Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats points as initial centroids for each corresponding dluster 3. For cach point, place it in the cluster whose current centrold it is nearest. . After all points are assigned, update the locations of centroids of the k clusters 5. Reassign all points to their closest centroid. This sometimes moves points between clusters 6. Repeat 4,5 untill convergence, Convergence occurs when points don’t move between clusters and .Use pwdl to check the directory where you should place your input file. . Use a dict data structures for storing centrolds and clusters. The centrolds dict will be a mapping from cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster centroids stabilize Show transcribed image text
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"
